[LU-13447] Sudden slow file create (MDS problem) Created: 10/Apr/20  Updated: 16/Jan/22  Resolved: 16/Jan/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.4
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Stephane Thiell Assignee: Mikhail Pershin
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

CentOS 7.6


Attachments: Text File fir-md1-s3_20200408_foreach_bt.txt     Text File fir-md1-s3_20200408_vmcore-dmesg.txt    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We faced a sudden MDS problem with 2.12.4, a few hours after LU-13442, where users reported this kind of slowness when creating new files:

sh02-ln03:bp86 09:03:28> time touch $SCRATCH/asdf

real 0m26.923s
user 0m0.000s
sys 0m0.003s
 

Looking at the MDS in question, we could see backtraces like these:

[1803096.937756] LNet: Service thread pid 41461 completed after 237.28s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
[1805310.391443] LNet: Service thread pid 20879 was inactive for 200.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[1805310.408554] Pid: 20879, comm: mdt03_010 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
[1805310.418900] Call Trace:
[1805310.421543]  [<ffffffffc1855fa8>] osp_precreate_reserve+0x2e8/0x800 [osp]
[1805310.428575]  [<ffffffffc184a949>] osp_declare_create+0x199/0x5f0 [osp]
[1805310.435312]  [<ffffffffc179269f>] lod_sub_declare_create+0xdf/0x210 [lod]
[1805310.442330]  [<ffffffffc178a86e>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
[1805310.449586]  [<ffffffffc178d80e>] lod_alloc_rr.constprop.19+0xeee/0x1490 [lod]
[1805310.457012]  [<ffffffffc179192d>] lod_qos_prep_create+0x12fd/0x1890 [lod]
[1805310.464007]  [<ffffffffc177296a>] lod_declare_instantiate_components+0x9a/0x1d0 [lod]
[1805310.472042]  [<ffffffffc1785725>] lod_declare_layout_change+0xb65/0x10f0 [lod]
[1805310.479468]  [<ffffffffc17f7f82>] mdd_declare_layout_change+0x62/0x120 [mdd]
[1805310.486724]  [<ffffffffc1800ec6>] mdd_layout_change+0xb46/0x16a0 [mdd]
[1805310.493473]  [<ffffffffc166135f>] mdt_layout_change+0x2df/0x480 [mdt]
[1805310.500130]  [<ffffffffc16697d0>] mdt_intent_layout+0x8a0/0xe00 [mdt]
[1805310.506787]  [<ffffffffc1666d35>] mdt_intent_policy+0x435/0xd80 [mdt]
[1805310.513459]  [<ffffffffc0ffbe06>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[1805310.520394]  [<ffffffffc10244f6>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
[1805310.527688]  [<ffffffffc10acb12>] tgt_enqueue+0x62/0x210 [ptlrpc]
[1805310.534034]  [<ffffffffc10b564a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[1805310.541167]  [<ffffffffc105843b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[1805310.549063]  [<ffffffffc105bda4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[1805310.555596]  [<ffffffff9fac2e81>] kthread+0xd1/0xe0
[1805310.560688]  [<ffffffffa0177c24>] ret_from_fork_nospec_begin+0xe/0x21
[1805310.567340]  [<ffffffffffffffff>] 0xffffffffffffffff
[1805310.572544] LustreError: dumping log to /tmp/lustre-log.1586359839.20879
[1805316.373786] LNet: Service thread pid 20879 completed after 206.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).

We took a crash dump available in the Whamcloud FTP as fir-md1-s3_20200408_vmcore

Attaching dmesg as fir-md1-s3_20200408_vmcore-dmesg.txt and foreach bt of the crash dump as fir-md1-s3_20200408_foreach_bt.txt.

A restart of the MDS has fixed the issue for now.



 Comments   
Comment by Peter Jones [ 10/Apr/20 ]

Mike

Could you please advise?

Thanks

Peter

Comment by Mikhail Pershin [ 13/Apr/20 ]

Logs shows connectivity problem with OSTs, so MDT cannot create OST objects for a file being created. I am not sure is that network problem or OST server problem, but that is not MDT issue. There are many messages like this:

[1767428.569313] LNetError: 20288:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
[1767428.579482] LNetError: 20288:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.0.10.109@o2ib7 (5): c: 0, oc: 0, rc: 8
[1767428.591908] LNetError: 20288:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.53@o2ib7 added to recovery queue. Health = 900
[1767428.604016] LNetError: 20288:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 5 previous similar messages

and as result:

[1767243.181693] LustreError: 20702:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) fir-OST0034-osc-MDT0002: cannot cleanup orphans: rc = -11
[1767244.194720] LustreError: 20702:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) fir-OST0034-osc-MDT0002: cannot cleanup orphans: rc = -11
[1767327.245811] LustreError: 20710:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) fir-OST0038-osc-MDT0002: cannot cleanup orphans: rc = -11
[1767341.462169] LustreError: 20714:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) fir-OST003a-osc-MDT0002: cannot cleanup orphans: rc = -11
Comment by Andreas Dilger [ 14/Apr/20 ]

We shouldn't really be blocking the create if an OST is unreachable unless all of the OSTs are unreachable.

Comment by Stephane Thiell [ 20/Apr/20 ]

Thanks for taking the time to look at this! Still, to me, it's bizarre that this could be due to a network or OST specific problem, as other MDTs accessing the OSTs seemed fine at that time, and all targets are connected to the same IB switch. Also, only one MDT was affected (1 / 4). Anyway, this is the single occurrence of this problem that we have seen so far, so I guess it's very rare. We'll report back if we see it again.

Generated at Sat Feb 10 03:01:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.