Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.5.0
    • None
    • 7025

    Description

      ldlm_poold wakes every second currently and walk list of all namespaces in the system doing some bookkeeping.
      On systems with a lot of servers this client list gets quite large and it makes no sense to visit every namespace in the list if some of them don't actually have any locks.

      As such I think it makes sense to only get ldlm_poold to iterate over non-empty client namespaces.
      Estimates from Fujitsu indicate that on a system with 2000 OSTs just the list iteration (i.e. empty namespaces) takes 2ms which is probably excessive.

      Attachments

        1. nopatch-idle-data
          0.4 kB
          James A Simmons
        2. patch-idle-5624-data
          0.3 kB
          James A Simmons
        3. patch-idle-5793-data
          0.3 kB
          James A Simmons

        Issue Links

          Activity

            [LU-2924] shrink ldlm_poold workload
            haasken Ryan Haasken added a comment -

            Oleg, in http://review.whamcloud.com/5624, why did you change the type of ldlm_srv_namespace_nr and ldlm_cli_namespace_nr from cfs_atomic_t to int?

            Now when we call ldlm_namespace_nr_*, are we expected to be holding the ldlm_namespace_lock()?

            haasken Ryan Haasken added a comment - Oleg, in http://review.whamcloud.com/5624 , why did you change the type of ldlm_srv_namespace_nr and ldlm_cli_namespace_nr from cfs_atomic_t to int? Now when we call ldlm_namespace_nr_*, are we expected to be holding the ldlm_namespace_lock()?

            Patches landed and test plan written.

            jlevi Jodi Levi (Inactive) added a comment - Patches landed and test plan written.

            This reminds me LU-1128, looks the pool shrinker on server side can't work as expected (under memory pressure, the SLV isn't decreased and the lock cancel from client never be triggered.)

            Though this ticket is mainly for addressing the client side issue.

            niu Niu Yawei (Inactive) added a comment - This reminds me LU-1128 , looks the pool shrinker on server side can't work as expected (under memory pressure, the SLV isn't decreased and the lock cancel from client never be triggered.) Though this ticket is mainly for addressing the client side issue.

            I tested both the patches for this ticket on a system were I create 1800 virtual OSTs on four OSS. So far I managed to collect the data for when it was idle 6 hours after it was mounted. 3 clients were used in the test. One client with no patch, second client with only patch 5624, and the third client with both patches.

            simmonsja James A Simmons added a comment - I tested both the patches for this ticket on a system were I create 1800 virtual OSTs on four OSS. So far I managed to collect the data for when it was idle 6 hours after it was mounted. 3 clients were used in the test. One client with no patch, second client with only patch 5624, and the third client with both patches.
            green Oleg Drokin added a comment -

            My final idea for this (for later) is to further spread apart wakeups on clients by looking at slv and clv and calculating when they meet (we can update clv locally on every lock moving to lru and can more or less accurately guess time left during recalc with some margin, and then decide to sleep this much. If server slv drastically changes, we can wake poold ahead of scheduled time it we want to (which sounds like a good idea anyway).
            In the extremum this could lead to no ldlmpoold at all and we can just do on-the-fly adjustments from rpc callbacks, like after-reply or similar.

            green Oleg Drokin added a comment - My final idea for this (for later) is to further spread apart wakeups on clients by looking at slv and clv and calculating when they meet (we can update clv locally on every lock moving to lru and can more or less accurately guess time left during recalc with some margin, and then decide to sleep this much. If server slv drastically changes, we can wake poold ahead of scheduled time it we want to (which sounds like a good idea anyway). In the extremum this could lead to no ldlmpoold at all and we can just do on-the-fly adjustments from rpc callbacks, like after-reply or similar.
            green Oleg Drokin added a comment -

            Another improvement idea: only wake up as frequently as makes sense, which means about once per 10 seconds on clients, instead of once every second. This should further reduce ldlm poold time wasting at the expense of some innacuracies in grant rate calculations, but I don't really think it's such an important metric on the client side.

            patch in http://review.whamcloud.com/5793

            green Oleg Drokin added a comment - Another improvement idea: only wake up as frequently as makes sense, which means about once per 10 seconds on clients, instead of once every second. This should further reduce ldlm poold time wasting at the expense of some innacuracies in grant rate calculations, but I don't really think it's such an important metric on the client side. patch in http://review.whamcloud.com/5793
            green Oleg Drokin added a comment -

            That's a bit orthogonal though also related issue.

            But those patches are just about shrinking.
            I am dealing with the ldlm_poold accounting thread, it's woken up once a second and iterates entire list of namespaces (on every node in the system), which adds (avoidable) OS jitter.

            green Oleg Drokin added a comment - That's a bit orthogonal though also related issue. But those patches are just about shrinking. I am dealing with the ldlm_poold accounting thread, it's woken up once a second and iterates entire list of namespaces (on every node in the system), which adds (avoidable) OS jitter.
            adilger Andreas Dilger added a comment - LU-1520 and patch http://review.whamcloud.com/3859 and LU-607 / http://review.whamcloud.com/1334 are what I was thinking about.

            I think there is already a patch to fix this in Gerrit. Search for message:poold or similar.

            adilger Andreas Dilger added a comment - I think there is already a patch to fix this in Gerrit. Search for message:poold or similar.
            green Oleg Drokin added a comment -

            I have an idea for a patch at http://review.whamcloud.com/5624

            I tried to add debug info to it and do some testing and the namespaces seem to be entering and exiting active list as expected.

            I suspect this should ease most of the pain of big systems since they rarely (if ever) have outstanding locks to all OSTs present, so this should be a good first step in taming ldlm_poold

            green Oleg Drokin added a comment - I have an idea for a patch at http://review.whamcloud.com/5624 I tried to add debug info to it and do some testing and the namespaces seem to be entering and exiting active list as expected. I suspect this should ease most of the pain of big systems since they rarely (if ever) have outstanding locks to all OSTs present, so this should be a good first step in taming ldlm_poold

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: