[LU-16588] lod doesn't include local MDT Created: 23/Feb/23  Updated: 23/Feb/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Story Priority: Minor
Reporter: Sergey Cheremencev Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-16501 QOS allocator not balancing space enough Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Working on patch https://review.whamcloud.com/50074 "LU-16501 lod: add qos_ost_weights to debugfs" I found that lod->lod_mdt_descs.ltd_tgt_pool doesn't include local MDT. I.e. in a single node setup with 2 MDTs lod mdt descriptor(lod_tgt_descs) for lustre-MDT0000-mdtlov includes only MDT0001, while lustre-MDT0001-mdtlov includes only tgt MDT0000. Is this correct from QOS point of view?



 Comments   
Comment by Patrick Farrell [ 23/Feb/23 ]

Well, since you can only have one stripe per MDT, the first stripe has already been picked by the time you are on the local MDT.  The local MDT is the first stripe.  So I think it's OK for QOS as long as you only have 1 stripe per MDT, because the local MDT isn't a valid choice.

Of course, this causes problems for metadata overstriping (which is not complete yet):

But I fixed them by special-casing the local MDT in QOS.
See https://review.whamcloud.com/c/fs/lustre-release/+/35034/12/lustre/lod/lod_qos.c

I didn't think it was realistic to add the local MDT to LOD - It seemed like too big of a change...?

adilger or laisiyao might have more to say about how QOS works today.

Comment by Andreas Dilger [ 23/Feb/23 ]

It seems reasonable from a QOS point of view that the local MDT is also included in the QOS weighting, rather than always creating a stripe on the local MDT. Consider the common case where MDT0000 is more full than other MDTs (filesystem predating MDT mkdir balancing), then if a new striped directory is created on MDT0000 it will always have a local stripe. It would be better to only create the master stripe on the local MDT and all of the remote stripes on less-full remote MDTs. Of course it would be better if the master + stripes were all created on the remote MDTs, but this would still use an agent inode on MDT0000 so it isn't any more efficient.

I don't think this is a fatal problem, but has come up at least twice now in different contexts, so may be worthwhile to fix. The LOD itself has a connection to the local MDT, but this is treated somewhat differently since it is a local OSD device rather than a remote OSP device. It might make sense to include the local MDT into the QOS tables without changing the connections (which I think would be complex and bad for performance), but it should still prioritize using the local MDT if it is within the reasonable usage of other MDTs, but skip it if the local MDTs is significantly overused.

For the purpose of patch 50047 I think showing the current state of the QOS table without the local MDT is OK, since that reflects the actual reality of the implementation, and if/when the local MDT is added to the QOS table it should also appear in the QOS parameter output.

Generated at Sat Feb 10 03:28:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.