Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3551

Process starvation during high contention metadata operation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    •  64 Clients 1 MDS 38 OSTs
    • 3
    • 8938

    Description

      While running mdtest it was noted there was some "unfair" io happening on the system.

      There were 1024 tasks on 64 clients. Each task was doing mdtest -n 1953 for a total of about 2 million total operation across all the threads to the single file. This was on a shared single file or ssf run with mdtest.

      On one of the clients it was noted that run time was not fair between the processes.

      root      36341  6.3  0.1 362484 76076 ?        DLl  12:51   2:41 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36342 10.5  0.0 262852 11344 ?        DLl  12:51   4:29 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36343 73.8  0.0 262852 11340 ?        DLl  12:51  31:19 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36344  6.2  0.0 262852 11340 ?        DLl  12:51   2:40 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36345 19.2  0.0 262916 11440 ?        DLl  12:51   8:10 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36346  6.4  0.0 262916 11444 ?        DLl  12:51   2:44 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36347 33.9  0.0 262916 11440 ?        DLl  12:51  14:24 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36348  5.3  0.0 262916 11440 ?        DLl  12:51   2:16 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36349 20.7  0.0 262916 11440 ?        DLl  12:51   8:47 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36350 26.9  0.0 262916 11436 ?        DLl  12:51  11:26 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36351  4.7  0.0 262916 11440 ?        DLl  12:51   2:01 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36352 36.1  0.0 262916 11448 ?        RLl  12:51  15:20 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36353 36.5  0.0 262916 11440 ?        DLl  12:51  15:30 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36354 27.7  0.0 262916 11432 ?        DLl  12:51  11:47 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36355 36.5  0.0 262916 11452 ?        DLl  12:51  15:29 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36356 78.1  0.0 262916 11448 ?        SLl  12:51  33:09 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36357 36.5  0.0 263076 11520 ?        DLl  12:51  15:29 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36358  4.0  0.0 263076 11532 ?        DLl  12:51   1:41 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36359 47.2  0.0 263076 11528 ?        DLl  12:51  20:02 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      root      36360  9.8  0.0 263072 11528 ?        DLl  12:51   4:10 /opt/mdtest-1.8.3/bin/mdtest -d /p/l_wham/d0.mdtest -i 1 -n 1953
      

      There not alot of debug info for this issue yet and it will need to be revisited again on latest Master. This is just a place holder for later investigation.

      Attachments

        Activity

          People

            wc-triage WC Triage
            keith Keith Mannthey (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: