[LU-9688] Stuck MDT in lod_qos_prep_create Created: 19/Jun/17  Updated: 18/Jul/17  Resolved: 18/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Stephane Thiell Assignee: Niu Yawei (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

3.10.0-514.10.2.el7_lustre.x86_64, lustre-2.9.0_srcc6-1.el7.centos.x86_64


Attachments: Text File oak-io1-s1.lustre.log     Text File oak-io1-s2.lustre.log     Text File oak-md1-s1.foreach_bt.txt     Text File oak-md1-s1.lustre.log    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Our MDT was stuck or barely usable twice in a row lately, and the second time we took a crash dump, which shows that several threads were blocked in lod_qos_prep_create...

PID: 291558  TASK: ffff88203c7b2f10  CPU: 9   COMMAND: "mdt01_030"
 #0 [ffff881a157f7588] __schedule at ffffffff8168b6a5
 #1 [ffff881a157f75f0] schedule at ffffffff8168bcf9
 #2 [ffff881a157f7600] rwsem_down_write_failed at ffffffff8168d4a5
 #3 [ffff881a157f7688] call_rwsem_down_write_failed at ffffffff81327067
 #4 [ffff881a157f76d0] down_write at ffffffff8168aebd
 #5 [ffff881a157f76e8] lod_qos_prep_create at ffffffffa124031c [lod]
 #6 [ffff881a157f77a8] lod_declare_striped_object at ffffffffa1239a8c [lod]
 #7 [ffff881a157f77f0] lod_declare_object_create at ffffffffa123b0f1 [lod]
 #8 [ffff881a157f7838] mdd_declare_object_create_internal at ffffffffa129d21f [mdd]
 #9 [ffff881a157f7880] mdd_declare_create at ffffffffa1294133 [mdd]
#10 [ffff881a157f78f0] mdd_create at ffffffffa1295689 [mdd]
#11 [ffff881a157f79e8] mdt_reint_open at ffffffffa1176f05 [mdt]
#12 [ffff881a157f7ad8] mdt_reint_rec at ffffffffa116c4a0 [mdt]
#13 [ffff881a157f7b00] mdt_reint_internal at ffffffffa114edc2 [mdt]
#14 [ffff881a157f7b38] mdt_intent_reint at ffffffffa114f322 [mdt]
#15 [ffff881a157f7b78] mdt_intent_policy at ffffffffa1159b9c [mdt]
#16 [ffff881a157f7bd0] ldlm_lock_enqueue at ffffffffa0b461e7 [ptlrpc]
#17 [ffff881a157f7c28] ldlm_handle_enqueue0 at ffffffffa0b6f3a3 [ptlrpc]
#18 [ffff881a157f7cb8] tgt_enqueue at ffffffffa0befe12 [ptlrpc]
#19 [ffff881a157f7cd8] tgt_request_handle at ffffffffa0bf4275 [ptlrpc]
#20 [ffff881a157f7d20] ptlrpc_server_handle_request at ffffffffa0ba01fb [ptlrpc]
#21 [ffff881a157f7de8] ptlrpc_main at ffffffffa0ba42b0 [ptlrpc]
#22 [ffff881a157f7ec8] kthread at ffffffff810b06ff
#23 [ffff881a157f7f50] ret_from_fork at ffffffff81696b98



The disk array (from Dell) that we use for the MDT doesn't report any issue. The load was not particularly high. kmem -i does report 76 GB of free memory (60% of TOTAL MEM).

Attaching the output of `foreach bt`, maybe somebody will have a clue.

 

Each time, failing over the MDT resumed operations, but the recovery was a bit long and with a few evictions.

Lustre: oak-MDT0000: Recovery over after 13:39, of 1144 clients 1134 recovered and 10 were evicted.

Thanks!
Stephane



 Comments   
Comment by Alex Zhuravlev [ 20/Jun/17 ]

it's blocked by another thread waiting for OST objects. please, provide logs from MDTs/OSTs if possible.

Comment by Stephane Thiell [ 20/Jun/17 ]

Hi Alex,

Thanks for the quick reply. That makes sense because we had some issues with the OSS oak-io1-s1 as it became unresponsive, we rebooted it on Jun 19 11:49:32, you can see that in the logs, and the OSTs were re-mounted at ~ Jun 19 12:00). Sorry I didn't mention that in the original ticket. So, I am attaching logs of the OSTs (OSS oak-io1-s1 and oak-io1-s2) and MDT (was mounted on MDS oak-md1-s1). While I was preparing the logs, I noticed that on the MDT (file oak-md1-s1.lustre.log), there are errors about objects precreation on one OST, could that be the issue?

Jun 19 11:47:23 oak-md1-s1 kernel: LustreError: 191781:0:(osp_precreate.c:615:osp_precreate_send()) oak-OST0016-osc-MDT0000: can't precreate: rc = -11
Jun 19 11:47:23 oak-md1-s1 kernel: LustreError: 191781:0:(osp_precreate.c:1243:osp_precreate_thread()) oak-OST0016-osc-MDT0000: cannot precreate objects: rc = -11


notes:
o2ib5 is the lnet network of the servers and a few clients
o2ib, o2ib3, o2ib4 are client only networks

Thanks,

Stephane

Comment by Peter Jones [ 21/Jun/17 ]

Niu

Can you please advise on this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 22/Jun/17 ]

Yes, the error message you mentioned is related to this issue, because precreate failed instantly, all create threads are blocked on waiting objects being created.
I checked OST log and found some md raid threads are hung in md_update_sb() at that time, I think that could be the root cause. Is this problem disappeared?

Comment by Stephane Thiell [ 22/Jun/17 ]

Hi Niu,

Thanks for looking at this. After making sure that the OSTs were OK and also failing over the MDT, the problem did not appear again.  I'm just a bit concerned that the MDT couldn't recover by itself in that specific case.

Thanks,

Stephane

Comment by Niu Yawei (Inactive) [ 06/Jul/17 ]

Hi, Stephane

That's good news, if OST fail to create objects due to backend storage problem, the creation on MDT will be blocked, we can't do much about in such situation but waiting for the storage recovered. Can we close this ticket now? Thanks.

Comment by Niu Yawei (Inactive) [ 18/Jul/17 ]

Bad disk, not Lustre issue.

Generated at Sat Feb 10 02:28:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.