Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19264

LRU Inconsistency in Client-Server Distributed Lock Caching​

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When reclaiming locks on the server side, the Least Recently Used (LRU) algorithm is employed based on the locks' last used time. However, when a client finds a compatible cached lock in its local lock namespace, it can directly use it without intract to the lock server. In this case, only the client-side lock's last used time is updated, while the corresponding server-side lock's LRU timestamp remains unchanged.

      In large-scale Lustre systems with hundreds or thousands of clients, the server-side lock namespace must manage all locks granted to clients. And these locks are cached on server side memory. Due to limited server memory capacity, when lock reclaim is triggered, the server's LRU timestamps may not reflect recent used accurately.
      This can lead to premature reclaim of locks that are actually still in active use or may be reused repeatedly in the near future by clients, resulting in unnecessary lock communication overhead and degraded system performance.

      According to the analysis above, the core issue is that when clients use locally cached locks, they update their local LRU timestamp but don't propagate this to the server. The server's LRU list becomes stale and may incorrectly revoke actively used locks during memory pressure situations. This could cause significant performance issues in large-scale deployments with hundreds of clients.

      This is a deep technical issue in distributed systems where client-side caching causes server-side lock management inaccuracies.

      Attachments

        1. Figure_max_age.png
          Figure_max_age.png
          547 kB
        2. Figure01-[2-4].png
          Figure01-[2-4].png
          317 kB
        3. Figure01-1.png
          Figure01-1.png
          89 kB
        4. Figure02.png
          Figure02.png
          173 kB
        5. Figure03.png
          Figure03.png
          172 kB
        6. Figure04.png
          Figure04.png
          177 kB
        7. Figure05.png
          Figure05.png
          155 kB
        8. Figure06.png
          Figure06.png
          199 kB
        9. Figure07.png
          Figure07.png
          246 kB
        10. Figure08.png
          Figure08.png
          196 kB
        11. LRU Inconsistency in Client-Server Distributed Lock Caching​.md
          14 kB
        12. mdtest_batch.png
          mdtest_batch.png
          393 kB
        13. mdtest01.png
          mdtest01.png
          204 kB

        Issue Links

          Activity

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: