[LU-12136] mdt threads blocked in mdd_create Created: 30/Mar/19  Updated: 10/Oct/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.6
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Stephane Thiell Assignee: Alex Zhuravlev
Resolution: Unresolved Votes: 1
Labels: None
Environment:

3.10.0-693.2.2.el7_lustre.pl1.x86_64


Attachments: File LU-12136-OSS-logs.tar.gz     Text File oak-md1-s2-kernel.log    
Issue Links:
Related
is related to LU-10697 MDT locking issues after failing over... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We had an issue yesterday on Oak storage with Lustre 2.10.6. MDT0 didn't crash but filesystem got stuck. Several stack traces showed up on oak-md1-s2 (serving MDT0000). Note: Oak uses DNE1 and another MDT0001 is mounted on oak-md1-s1, but I didn't find any stack trace on this one. A restart of MDT0000 fixed the issue (after a workaround to mitigate LU-8992).

My short-term plan is to upgrade Oak to 2.10.7 in a rolling fashion but I thought it would be of interest to have a ticket to track this issue. I'm also attaching kernel logs from this server as oak-md1-s2-kernel.log where all stack traces can be seen.

First call trace was:

Mar 29 09:38:38 oak-md1-s2 kernel: INFO: task mdt00_003:3491 blocked for more than 120 seconds.
Mar 29 09:38:38 oak-md1-s2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 09:38:38 oak-md1-s2 kernel: mdt00_003       D ffffffff00000000     0  3491      2 0x00000080
Mar 29 09:38:38 oak-md1-s2 kernel: ffff88201e3f74b8 0000000000000046 ffff88201e3c3f40 ffff88201e3f7fd8
Mar 29 09:38:38 oak-md1-s2 kernel: ffff88201e3f7fd8 ffff88201e3f7fd8 ffff88201e3c3f40 ffff88201e3c3f40
Mar 29 09:38:38 oak-md1-s2 kernel: ffff88101fc13248 ffff88101fc13240 fffffffe00000001 ffffffff00000000
Mar 29 09:38:38 oak-md1-s2 kernel: Call Trace:
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff816a94e9>] schedule+0x29/0x70
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff816aadd5>] rwsem_down_write_failed+0x225/0x3a0
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff81332047>] call_rwsem_down_write_failed+0x17/0x30
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff816a87cd>] down_write+0x2d/0x3d
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc121b34f>] lod_alloc_qos.constprop.17+0x1af/0x1590 [lod]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0fa49a1>] ? qsd_op_begin0+0x181/0x940 [lquota]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0ed322f>] ? ldiskfs_xattr_ibody_get+0xef/0x1a0 [ldiskfs]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc12204d1>] lod_qos_prep_create+0x1291/0x17f0 [lod]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc1220bf9>] ? lod_prepare_inuse+0x1c9/0x2e0 [lod]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc1220f6d>] lod_prepare_create+0x25d/0x360 [lod]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc121578e>] lod_declare_striped_create+0x1ee/0x970 [lod]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc1217c04>] lod_declare_create+0x1e4/0x540 [lod]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc12828cf>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc1274023>] mdd_declare_create+0x53/0xe20 [mdd]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc1277ec9>] mdd_create+0x879/0x1400 [mdd]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc114ab93>] mdt_reint_open+0x2173/0x3190 [mdt]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0931dde>] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc112fad3>] ? ucred_set_jobid+0x53/0x70 [mdt]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc113fa40>] mdt_reint_rec+0x80/0x210 [mdt]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc112131b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc1121842>] mdt_intent_reint+0x162/0x430 [mdt]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc112c5ae>] mdt_intent_policy+0x43e/0xc70 [mdt]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0afc12f>] ? ldlm_resource_get+0x9f/0xa30 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0af5277>] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0b1e9e3>] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0b46bc0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0ba3e92>] tgt_enqueue+0x62/0x210 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0ba7d95>] tgt_request_handle+0x925/0x1370 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0b50bf6>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0b4d228>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff810c4822>] ? default_wake_function+0x12/0x20
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0b54332>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffffc0b538a0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
Mar 29 09:38:38 oak-md1-s2 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40


 Comments   
Comment by Alex Zhuravlev [ 01/Apr/19 ]

please, attach logs from OST.

Comment by Joseph Gmitter (Inactive) [ 03/Apr/19 ]

Hi Stephane,

Any luck in getting the OST logs as requested?

Thanks.

Joe

Comment by Stephane Thiell [ 03/Apr/19 ]

Hi Alex and Joe – I just attached the OSS kernel logs, thanks!
Note: We upgraded Oak servers to 2.10.7 last night.

Comment by Andreas Dilger [ 04/Oct/19 ]

Alex, did you ever get a chance to look into this code? It seems like we should not be holding this global mutex over RPC calls. If the OST is unavailable, or doesn't have any objects, then the call should trigger the precreated thread and return immediately, rather than blocking the MDS thread.

Comment by Alex Zhuravlev [ 07/Oct/19 ]

hmm, the cause for reboot was LBUG, AFAICS:

Mar 29 10:28:48 oak-md1-s2 kernel: LustreError: 118086:0:(osd_handler.c:1982:osd_object_release()) LBUG
Mar 29 10:28:48 oak-md1-s2 kernel: Pid: 118086, comm: tgt_recover_0 3.10.0-693.2.2.el7_lustre.pl1.x86_64 #1 SMP Sat Sep 30 00:44:25 PDT 2017
Mar 29 10:28:48 oak-md1-s2 kernel: Call Trace:
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffff8103a212>] save_stack_trace_tsk+0x22/0x40
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc08247cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc082487c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc0fe3660>] osd_get_ldiskfs_dirent_param+0x0/0x130 [osd_ldiskfs]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc096ac4d>] lu_object_put+0x17d/0x3d0 [obdclass]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc0c0e738>] distribute_txn_replay_handle+0x418/0xc00 [ptlrpc]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc0b54987>] replay_request_or_update.isra.21+0x247/0x8c0 [ptlrpc]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffffc0b55616>] target_recovery_thread+0x616/0x1360 [ptlrpc]
Mar 29 10:28:48 oak-md1-s2 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Comment by Alex Zhuravlev [ 07/Oct/19 ]

Andreas, normally we don't wait on RPC in that path - when specific OST is out of precreated objects, but once we hit it then the calling thread wats for obd_timeout..

Comment by Nathan Dauchy (Inactive) [ 23/Oct/19 ]

Any update on getting a fix for this issue?  We have hit it at NOAA at least twice, with no sign of any LBUG.  This is with lustre-2.10.7_ddn8.  Here is the call trace in case it helps:

Oct 21 23:08:11 lfs-mds-1-1 kernel: INFO: task mdt00_002:17259 blocked for more than 120 seconds.
Oct 21 23:08:11 lfs-mds-1-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 21 23:08:11 lfs-mds-1-1 kernel: mdt00_002       D ffff96927956d140     0 17259      2 0x00000000
Oct 21 23:08:11 lfs-mds-1-1 kernel: Call Trace:
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa4b68e19>] schedule+0x29/0x70
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa4b6a705>] rwsem_down_write_failed+0x225/0x3a0
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa47874a7>] call_rwsem_down_write_failed+0x17/0x30
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa4b6814d>] down_write+0x2d/0x3d
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1624cef>] lod_alloc_qos.constprop.17+0x1af/0x1590 [lod]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc12d7a47>] ? qsd_op_begin0+0xc7/0x940 [lquota]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa467966d>] ? __getblk+0x2d/0x300
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1628d51>] lod_qos_prep_create+0x12b1/0x1810 [lod]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1629479>] ? lod_prepare_inuse+0x1c9/0x2e0 [lod]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc16297ed>] lod_prepare_create+0x25d/0x360 [lod]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc161e11e>] lod_declare_striped_create+0x1ee/0x970 [lod]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1620594>] lod_declare_create+0x1e4/0x550 [lod]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc168dc72>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc167d1a3>] mdd_declare_create+0x53/0xe40 [mdd]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1681739>] mdd_create+0x9c9/0x1420 [mdd]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1552685>] mdt_reint_open+0x2175/0x3190 [mdt]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0ff2037>] ? lustre_msg_add_version+0x27/0xa0 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1547403>] mdt_reint_rec+0x83/0x210 [mdt]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc152818b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc15286b2>] mdt_intent_reint+0x162/0x480 [mdt]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc152b50b>] mdt_intent_opc+0x1eb/0xaf0 [mdt]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0ff4060>] ? lustre_swab_ldlm_policy_data+0x30/0x30 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1533e70>] mdt_intent_policy+0x130/0x320 [mdt]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0fa2226>] ldlm_lock_enqueue+0x396/0x960 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0fcbcd7>] ldlm_handle_enqueue0+0xa97/0x1640 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0ff40e0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1054882>] tgt_enqueue+0x62/0x210 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc10587ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0ffe12b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc0ffa758>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa44d6af2>] ? default_wake_function+0x12/0x20
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa44cbb8b>] ? __wake_up_common+0x5b/0x90
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1001872>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffc1000de0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa44c1d21>] kthread+0xd1/0xe0
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa44c1c50>] ? insert_kthread_work+0x40/0x40
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa4b75c37>] ret_from_fork_nospec_begin+0x21/0x21
Oct 21 23:08:11 lfs-mds-1-1 kernel:  [<ffffffffa44c1c50>] ? insert_kthread_work+0x40/0x40
Comment by Stephane Thiell [ 23/Oct/19 ]

Hi Nathan,

Just FYI, we've not seen this problem on our 2.10 servers since we use 2.10.8 + 3 patches (https://github.com/stanford-rc/lustre/commits/b2_10_oak). I'm wondering if you have "LU-12018 quota: do not start a thread under memory pressure". It's a critical patch that is missing from 2.10.8 IMHO.

Comment by Gerrit Updater [ 30/Apr/20 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38428
Subject: LU-12136 osp: blocked-in-precreation stats
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a3c054c4b39c49a750666698c52385d4f7133a98

Comment by Alex Zhuravlev [ 10/Oct/22 ]

the number of improvements have been made in this area (LU-8367, LU-15082, LU-15724, LU-13073, LU-13513, LU-12397, etc). probably makes sense to check is this still a problem with the newer versions?

Comment by Stephane Thiell [ 10/Oct/22 ]

Thanks Alex! Agreed that this is an old ticket and is probably not relevant anymore.

Generated at Sat Feb 10 02:49:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.