Details

    • Story
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Working on patch https://review.whamcloud.com/50074 "LU-16501 lod: add qos_ost_weights to debugfs" I found that lod->lod_mdt_descs.ltd_tgt_pool doesn't include local MDT. I.e. in a single node setup with 2 MDTs lod mdt descriptor(lod_tgt_descs) for lustre-MDT0000-mdtlov includes only MDT0001, while lustre-MDT0001-mdtlov includes only tgt MDT0000. Is this correct from QOS point of view?

      Attachments

        Issue Links

          Activity

            [LU-16588] lod doesn't include local MDT

            It seems reasonable from a QOS point of view that the local MDT is also included in the QOS weighting, rather than always creating a stripe on the local MDT. Consider the common case where MDT0000 is more full than other MDTs (filesystem predating MDT mkdir balancing), then if a new striped directory is created on MDT0000 it will always have a local stripe. It would be better to only create the master stripe on the local MDT and all of the remote stripes on less-full remote MDTs. Of course it would be better if the master + stripes were all created on the remote MDTs, but this would still use an agent inode on MDT0000 so it isn't any more efficient.

            I don't think this is a fatal problem, but has come up at least twice now in different contexts, so may be worthwhile to fix. The LOD itself has a connection to the local MDT, but this is treated somewhat differently since it is a local OSD device rather than a remote OSP device. It might make sense to include the local MDT into the QOS tables without changing the connections (which I think would be complex and bad for performance), but it should still prioritize using the local MDT if it is within the reasonable usage of other MDTs, but skip it if the local MDTs is significantly overused.

            For the purpose of patch 50047 I think showing the current state of the QOS table without the local MDT is OK, since that reflects the actual reality of the implementation, and if/when the local MDT is added to the QOS table it should also appear in the QOS parameter output.

            adilger Andreas Dilger added a comment - It seems reasonable from a QOS point of view that the local MDT is also included in the QOS weighting, rather than always creating a stripe on the local MDT. Consider the common case where MDT0000 is more full than other MDTs (filesystem predating MDT mkdir balancing), then if a new striped directory is created on MDT0000 it will always have a local stripe. It would be better to only create the master stripe on the local MDT and all of the remote stripes on less-full remote MDTs. Of course it would be better if the master + stripes were all created on the remote MDTs, but this would still use an agent inode on MDT0000 so it isn't any more efficient. I don't think this is a fatal problem, but has come up at least twice now in different contexts, so may be worthwhile to fix. The LOD itself has a connection to the local MDT, but this is treated somewhat differently since it is a local OSD device rather than a remote OSP device. It might make sense to include the local MDT into the QOS tables without changing the connections (which I think would be complex and bad for performance), but it should still prioritize using the local MDT if it is within the reasonable usage of other MDTs, but skip it if the local MDTs is significantly overused. For the purpose of patch 50047 I think showing the current state of the QOS table without the local MDT is OK, since that reflects the actual reality of the implementation, and if/when the local MDT is added to the QOS table it should also appear in the QOS parameter output.

            Well, since you can only have one stripe per MDT, the first stripe has already been picked by the time you are on the local MDT.  The local MDT is the first stripe.  So I think it's OK for QOS as long as you only have 1 stripe per MDT, because the local MDT isn't a valid choice.

            Of course, this causes problems for metadata overstriping (which is not complete yet):

            But I fixed them by special-casing the local MDT in QOS.
            See https://review.whamcloud.com/c/fs/lustre-release/+/35034/12/lustre/lod/lod_qos.c

            I didn't think it was realistic to add the local MDT to LOD - It seemed like too big of a change...?

            adilger or laisiyao might have more to say about how QOS works today.

            paf0186 Patrick Farrell added a comment - Well, since you can only have one stripe per MDT, the first stripe has already been picked by the time you are on the local MDT.  The local MDT is the first stripe.  So I think it's OK for QOS as long as you only have 1 stripe per MDT, because the local MDT isn't a valid choice. Of course, this causes problems for metadata overstriping (which is not complete yet): But I fixed them by special-casing the local MDT in QOS. See https://review.whamcloud.com/c/fs/lustre-release/+/35034/12/lustre/lod/lod_qos.c I didn't think it was realistic to add the local MDT to LOD - It seemed like too big of a change...? adilger or laisiyao might have more to say about how QOS works today.

            People

              wc-triage WC Triage
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: