Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17329

Relaxed POSIX Consistency for Lustre



    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807


      If performance is a criterion, consistency requirements for applications might be best decided by applications or users themselves. Forcing an application that has little or no sharing to use a strong or strict consistency model may lead to unnecessarily reduced I/O performance. Traditional techniques to provide strong file system consistency guarantees for both meta-data and data use vairants of locking techniques. For example, Lustre and GPFS use DLM locking to implement POSIX with strong consistency. Rather than locking when enforcing serialization for read-write sharing or write-write shareing for the entire file system, we can use optimistic concurrency control mechanism with the presumption that these are rare events. Avoidance of distributed locking enhances the scalability and performance of the system.

      Since different applications can have different sharing behavior, designing for performance and consistency would force the design to cater to all their needs simultaneously. Parallel cluster file systems (such as Lustre and GPFS) enforce data consistency by using byte-range distributed locking to allow simultaneous file access from multiple clients to its shared disks. Such fine-grained file locking schemes allow multiple processes to simultaneously write to different regions of a shared file. However, they also restrict scalability because of the overhead associated with maintaining state of a large number of locks, eventually leading to performance degradation.

      In a POSIX-compliant distributed file system, the behavior of serving multiple processes on multiple client nodes should be the same as the behavior of a local file system. Lustre provides POSIX-compliant consistency. However, the POSIX consistency semantics could be carefully relaxed in some cases in order to better align with the needs of specific applications and to improve the system performance. A user can define not just standard consistency policies like POSIX, but also custom policies like session, lease and NFS, at a chosen granularity (sub tree, file). A client can be using several different consistency policies for different files or even changing the consistency policy for a given file at runtime, without having to restart the file system. Leaving the choice of the consistency policy and allowing the user to change it at runtime enable tuning performance at a very fine granularity.

      One approach for relaxing consistency is to decouple the namespace. i.e. a client can lock the subtree it wants exclusive access to in MetaWBC mode, then the file system can optimize performance via lockless I/O mode, merging updates. The file system could enter a mode for a such given subtree to perform operations locally and bulk merge their updates at completion. This delayed merge (i.e. a form of eventual consistency) and relaxed durability improves performance and scalability by avoiding the costs of remote procedure calls (RPCs), synchronization, false sharing, and serialization.

      We present an API and framework that allows administrators dynamically control the consistency guarantees for subtrees in the global file system namespace. Allowing different semantics co-exist in a global namespace scales further and performs better than systems that use one POSIX consistency mode.

      Our initial expected goal is to improve the IO500 performance by using the loose relaxed consistency model similar to NFS in Lustre.


        Issue Links



              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              0 Vote for this issue
              7 Start watching this issue