Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1376

ldlm_poold noise on clients significantly reduces applification performance

    XMLWordPrintable

Details

    • 3
    • 4033

    Description

      Our users found that their application was scaling very poorly on our "zin" cluster. It is a sandy bridge cluster, 16 cores per node, roughly 3000 nodes. At relatively low node counts (512 nodes), they found that their performance on zin now that it is one the secure network is 1/4 of what is was when zin was on the open network.

      One of the few differences is that zin now talks to 3000+ OSTs on the secure network, whereas it only talk to a few hundred OSTs while it was shaken down on the open network. One of our engineers noted that the ldlm_poold was frequently using 0.3% of CPU time on zin.

      The application in question is HIGHLY sensitive to system daemons and other CPU noise on the compute nodes because it highly MPI coordinated. I created the attached patch (ldlm_poold_period.patch) that allows me to change the sleep interval used by the ldlm_poold. Sure enough, if I change the sleep time to 300 seconds, the application's performance immediate improves by 4X.

      The ldlm_poold walking a list of 3000+ namespaces every second and doing nothing most of the time (because client namespaces are only actually "recalculated" every 10s) is a very bad design. The patch was just to determine if that was really the cause.

      I will now work on a real fix.

      I think instead of making the ldlm_poold's sleep time configurable, I will make both the LDLM_POOL_SRV_DEF_RECALC_PERIOD and LDLM_POOL_CLI_DEF_RECALC_PERIOD tunables. Then I will make the ldlm_poold will dynamically sleep based on the next period in the list of namespaces...although I probably don't want each name space to have its own starting time.

      I probably want to keep the server and client namespace periods in sync with the namespaces of the same type, and then perhaps order the list as well to avoid walking the entire list unnecessarily.

      No work needed by Whamcloud right now, except perhaps to comment on my approach if you think there is something that I should be doing differently (or if there is already work in this area that I haven't found).

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: