Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8063

Fileset mount with automatic group lock

Details

    • New Feature
    • Resolution: Won't Fix
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      Recently patch of fileset mount support has been merged into master
      branch. That feature enables mounting sub-directories of Lustre
      file system, which could be used as a way to isolate namespaces
      between different groups. And the isolated namespaces and different
      mount points natually imply that they might have different filesystem
      semantics. That is why I am wondering whether fileset mount feature
      can be combined with group lock feature.

      Group lock is a LDLM lock type which turns off LDLM locking on a file
      for group members. In this way it avoids the LDLM lock overhead of
      accessing the file. However, using group lock requires modification
      of the application since an ioctl() needs to be called to get the
      group lock. This is not convenient, since sometimes, it is just
      impossible to modify the application. So these applications can't use
      group lock to speed up performance even if the processes of the
      application access the file in a way that won't overlap with each
      other.

      Thus, a mount option of "grouplock=$GID" could be added to indicate
      that any open() operation on that client also means acquiring a group
      lock of that file. All the group members could share data efficiently
      without LDLM lock overhead by mounting the file system with the same
      group lock ID. And normal clients can still access the file if no
      one is holding the group lock of it.

      The mount option of group lock is not suitable to be used on the top
      directory of Lustre, because all of the other clients will be
      affected. However, with fileset mount feature, a sub-directory of the
      file system can be mounted with the mount option of group lock, thus
      only a subtree will have that special behavior. And that subtree might
      only be actively accessed by a few group members.

      Further more, currently, we don't have any way to avoid ldlm lock
      latency of metadata access like group lock. But if we are able to
      build similar mechanism to provide a file system semantics which is
      less strict than POSIX, i.e. NFS-like interfaces or even local-
      filesystem-like interfaces without any cache protection or
      synchronization provided by LDLM lock, we could use it together with
      fileset mount feature too. None-POSIX semantics might cause huge
      performance improvement, especially for the weak points of Lustre
      like small file I/O. And together with the scalabilty that Lustre
      can provide, it will enable entirely new use cases.

      Anyway, I will push the patch that adds automatic group lock support
      soon.

      Attachments

        Activity

          [LU-8063] Fileset mount with automatic group lock

          Hi Andreas,

          Yeah, I think PCC might be better solution for some of the use cases mentioned in this ticket. And Rreadonly-PCC uses grouplock. So I am closing this ticket.

          It was good to discuss though. Thanks!

          lixi Li Xi (Inactive) added a comment - Hi Andreas, Yeah, I think PCC might be better solution for some of the use cases mentioned in this ticket. And Rreadonly-PCC uses grouplock. So I am closing this ticket. It was good to discuss though. Thanks!

          Li Xi, is this work replaced by the LCOC/PCC feature? Otherwise, I think that using group lock is not really the right way to fix this problem. Group lock still needs to get a DLM lock from the server, so it only makes a difference when there are multiple clients accessing the same file, to avoid lock contention.

          adilger Andreas Dilger added a comment - Li Xi, is this work replaced by the LCOC/PCC feature? Otherwise, I think that using group lock is not really the right way to fix this problem. Group lock still needs to get a DLM lock from the server, so it only makes a difference when there are multiple clients accessing the same file, to avoid lock contention.

          I totally agree that we shouldn't merge this patch in a rush without fully
          understanding the benefits and problems.

          We currently don't have any application which is using grouplock. I myself, is
          wondering how much performance improvement could be achieved by using
          grouplocks. It is not easy to make a clear conclusion because the performance
          improvement is highly correlated with the workloads. That means, we can't use
          any existing benchmark tools to test gouplock. The results of some bechmarks
          might be disappointing (if, as I am expecting, LDLM is highly optimized).

          However, I do believe that there is some workloads which can be accelerated by
          grouplock. A perfect place of using grouplock is I/O forwarding system or I/O
          proxy, or any middle ware between acess nodes (or compute nodes) and Lustre
          clients. Because Lustre is kind of wrapped by the middle ware, the potential
          problems of grouplock can be avoided in many ways. And we are seeing more kinds
          of middle wares in front of Lustre, including I/O forwarding systems, burst
          buffers, hadoop, openstack, or even qemu. In these use cases, the I/O patterns
          might not be traditional. And in order to better support those patterns, Lustre
          could probably don't need to provide POSIX semantics. For example, if we want
          to store KVM images on Lustre, we probaly don't need the data to be synced in
          in real time between clients at all.

          Of course, those use cases are not traditional fields that Lustre is focusing
          on. Lustre is doing great in HPC, but is used less in other fields. In my
          opinion, being used by more users in more domains always bring benefits.

          Obviously, grouplock won't be useful for all of those cases. Its use cases
          might be very limited. That is why I am looking for other mechanisms, such as
          fscache/cachefiles support for Lustre. And Lustre with fscache seems also
          limited, which can only be used as read cache of dataas far as I can see. I am
          wondering whether there is any way to build a writable cache level on Lustre
          client which is not necessarily be based on memory pages, yet can cache both
          the metadata and data of a Lustre subtree, and buffer all the modifications in
          that subtree. That might be more helpful for some use cases. And it is still
          worth to do so even if semantics or interfaces of Lustre need to be changed.

          lixi Li Xi (Inactive) added a comment - I totally agree that we shouldn't merge this patch in a rush without fully understanding the benefits and problems. We currently don't have any application which is using grouplock. I myself, is wondering how much performance improvement could be achieved by using grouplocks. It is not easy to make a clear conclusion because the performance improvement is highly correlated with the workloads. That means, we can't use any existing benchmark tools to test gouplock. The results of some bechmarks might be disappointing (if, as I am expecting, LDLM is highly optimized). However, I do believe that there is some workloads which can be accelerated by grouplock. A perfect place of using grouplock is I/O forwarding system or I/O proxy, or any middle ware between acess nodes (or compute nodes) and Lustre clients. Because Lustre is kind of wrapped by the middle ware, the potential problems of grouplock can be avoided in many ways. And we are seeing more kinds of middle wares in front of Lustre, including I/O forwarding systems, burst buffers, hadoop, openstack, or even qemu. In these use cases, the I/O patterns might not be traditional. And in order to better support those patterns, Lustre could probably don't need to provide POSIX semantics. For example, if we want to store KVM images on Lustre, we probaly don't need the data to be synced in in real time between clients at all. Of course, those use cases are not traditional fields that Lustre is focusing on. Lustre is doing great in HPC, but is used less in other fields. In my opinion, being used by more users in more domains always bring benefits. Obviously, grouplock won't be useful for all of those cases. Its use cases might be very limited. That is why I am looking for other mechanisms, such as fscache/cachefiles support for Lustre. And Lustre with fscache seems also limited, which can only be used as read cache of dataas far as I can see. I am wondering whether there is any way to build a writable cache level on Lustre client which is not necessarily be based on memory pages, yet can cache both the metadata and data of a Lustre subtree, and buffer all the modifications in that subtree. That might be more helpful for some use cases. And it is still worth to do so even if semantics or interfaces of Lustre need to be changed.

          Before we land a patch like this, I think there needs to be a clear benefit to having it in the first place.

          Enabling group lock on every file in a subtree mount could potentially cause a lot of problems for applications running there, since it means that there will be no cache invalidation between clients, and writes from one client will not be seen by another client without an explicit sync() on the file. It isn't even clear if closing the file will cause the data to be flushed or not. Also, any files in this subtree may be permanently inaccessable to other clients that are not mounting with this same grouplock option, if one of the grouplock clients holds the file open, which will cause lock timeouts on the other clients and other potential problems.

          Since the main motivation for such a patch is performance due to reduced DLM locking, have you measured some performance improvement from using this patch? If yes, what kind of workload was run to see this performance improvement, and is there some other way to get this improvement with the current DLM without the need to have explicit mount options for such subtrees? That would benefit all Lustre clients, instead of the small subset that would be mounted with the grouplock option.

          adilger Andreas Dilger added a comment - Before we land a patch like this, I think there needs to be a clear benefit to having it in the first place. Enabling group lock on every file in a subtree mount could potentially cause a lot of problems for applications running there, since it means that there will be no cache invalidation between clients, and writes from one client will not be seen by another client without an explicit sync() on the file. It isn't even clear if closing the file will cause the data to be flushed or not. Also, any files in this subtree may be permanently inaccessable to other clients that are not mounting with this same grouplock option, if one of the grouplock clients holds the file open, which will cause lock timeouts on the other clients and other potential problems. Since the main motivation for such a patch is performance due to reduced DLM locking, have you measured some performance improvement from using this patch? If yes, what kind of workload was run to see this performance improvement, and is there some other way to get this improvement with the current DLM without the need to have explicit mount options for such subtrees? That would benefit all Lustre clients, instead of the small subset that would be mounted with the grouplock option.
          pjones Peter Jones added a comment -

          Thanks Li Xi

          pjones Peter Jones added a comment - Thanks Li Xi

          Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19760
          Subject: LU-8063 llite: add grouplock mount option
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 6b95068bbebd9cf76346fe2365d0d862390f7a4c

          gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19760 Subject: LU-8063 llite: add grouplock mount option Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6b95068bbebd9cf76346fe2365d0d862390f7a4c

          People

            lixi Li Xi (Inactive)
            lixi Li Xi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: