Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4570

Metadata slowdowns on production filesystem at ORNL

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0
    • Lustre 2.4.1
    • None
    • RHEL 6.4, 2.6.32-358.18.1.el6.x86_64 kernel
    • 3
    • 12490

    Description

      Clients:
      Some 1.8.6, 1.8.9 clients
      Majority 2.4.X clients

      Server (noted above)

      Users are seeing several minute metadata operation times (interactive performance is usually the canary in the coal mine). We are unable to enable DNE due to 1.8 client interactions and the need to have the data available.

      We run periodic lctl dk with +rpctrace and I can provide information.

      To date we've found one application that was issuing ~30k LDLM_ENQUEUE RPC's in a 30-45s range (lctl dk overflowed, so only got 45/60s of trace data). We've put that application on hold, but are still seeing some slowdowns.

      We have some "ls timers" scattered around compute clusters and have been seeing some responses in the 60-80s ranges for an ls of a pre-determined directory.

      Questions:

      1. My current guess is that we are not getting any of the metadata improvements in 2.4. Can you confirm?

      2. We are running with options mdt mds_num_threads=1024. We are thinking of changing that to 2048. Recommendations?

      3. What further data would be of interest in tracking this down? I have the raw rpctrace data. Would it be useful to add +nettrace or +dlmtrace and send that along? (I think +dlmtrace, but others may be good too)

      4. If we were to get rid of all the 1.8 clients would the 2.4 metadata improvements be realized without a reboot of the servers? We are waiting on Cray to provide a software stack for some of our login servers that supports 2.4 clients, and are highly blocked on this.

      5. If we cannot get rid of the 1.8 clients, are there any other suggestions for increasing the metadata throughput? We are not blocking on host IO to the MDT device (less than 6MB/s as recorded by iostat); we have had some past memory pressure on the MDS (have recently reduced ldlm.namespaces.*.lru_size on cluster (non-Cray) clients from default of unlimited to 128 on compute 2000 on login nodes), but in general the mds host seems responsive and "normal".

      Happy to provide more information, just ask and we'll get it to you.

      Thanks!


      -Jason

      Attachments

        1. atlas-mds1-sysrq-t.tgz
          1.24 MB
        2. ldlm_lockd.c.lustrekernel.txt
          69 kB
        3. rhea-lustre-logs-LU4570.tgz
          102 kB
        4. rpctrace-mds1.bz2
          0.2 kB
        5. titan-lustre-logs-LU4570.tgz
          1.63 MB

        Issue Links

          Activity

            People

              green Oleg Drokin
              hilljjornl Jason Hill (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: