Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8246

Leaks on ldlm granted locks counter on MDS leading to canceling loop

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.5.3, Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      A performance problem at one of our customers led us to find that the granted ldlm locks counter (found in /proc/fs/lustre/ldlm/namespaces/mdt-fsname-MDT0000_UUID/pool/granted) is actually missing some decrements in some conditions (yet to be determined).

      This leads after some time to have this counter largely exceed the number found in /proc/fs/lustre/ldlm/namespaces/mdt-fsname-MDT0000_UUID/pool/limit.

      See here:

      [root@prolixmds1 pool]# pwd
      /proc/fs/lustre/ldlm/namespaces/mdt-scratch-MDT0000_UUID/pool
      [root@prolixmds1 pool]# cat limit
      3203616
      [root@prolixmds1 pool]# cat granted
      54882822
      

      However, summing up all granted locks as seen by the all the clients, we get only 16k locks, which is also consistent with the slab consumption on the MDS.

      Once above the limit, the MDS will then constantly try to cancel locks, even those which are not above max_age. Clients then reacquire the locks, but lose time in the process (then showing the performance problem).

      Note that as this is only the counter which is false, we don't have any resource overconsumption tied to this problem.

      We found that this problem is also seen on 2.8.
      Can you help find where the leak comes from ?

      I also wonder if there is any relation with the last comment from Shuichi Ihara in LU-5727.
      I also think Christopher Morrone pointed this out here

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: