[LU-8063] Fileset mount with automatic group lock Created: 25/Apr/16  Updated: 12/Jan/18  Resolved: 12/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor
Reporter: Li Xi (Inactive) Assignee: Li Xi (Inactive)
Resolution: Won't Fix Votes: 0
Labels: patch

Rank (Obsolete): 9223372036854775807

 Description   

Recently patch of fileset mount support has been merged into master
branch. That feature enables mounting sub-directories of Lustre
file system, which could be used as a way to isolate namespaces
between different groups. And the isolated namespaces and different
mount points natually imply that they might have different filesystem
semantics. That is why I am wondering whether fileset mount feature
can be combined with group lock feature.

Group lock is a LDLM lock type which turns off LDLM locking on a file
for group members. In this way it avoids the LDLM lock overhead of
accessing the file. However, using group lock requires modification
of the application since an ioctl() needs to be called to get the
group lock. This is not convenient, since sometimes, it is just
impossible to modify the application. So these applications can't use
group lock to speed up performance even if the processes of the
application access the file in a way that won't overlap with each
other.

Thus, a mount option of "grouplock=$GID" could be added to indicate
that any open() operation on that client also means acquiring a group
lock of that file. All the group members could share data efficiently
without LDLM lock overhead by mounting the file system with the same
group lock ID. And normal clients can still access the file if no
one is holding the group lock of it.

The mount option of group lock is not suitable to be used on the top
directory of Lustre, because all of the other clients will be
affected. However, with fileset mount feature, a sub-directory of the
file system can be mounted with the mount option of group lock, thus
only a subtree will have that special behavior. And that subtree might
only be actively accessed by a few group members.

Further more, currently, we don't have any way to avoid ldlm lock
latency of metadata access like group lock. But if we are able to
build similar mechanism to provide a file system semantics which is
less strict than POSIX, i.e. NFS-like interfaces or even local-
filesystem-like interfaces without any cache protection or
synchronization provided by LDLM lock, we could use it together with
fileset mount feature too. None-POSIX semantics might cause huge
performance improvement, especially for the weak points of Lustre
like small file I/O. And together with the scalabilty that Lustre
can provide, it will enable entirely new use cases.

Anyway, I will push the patch that adds automatic group lock support
soon.



 Comments   
Comment by Gerrit Updater [ 25/Apr/16 ]

Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19760
Subject: LU-8063 llite: add grouplock mount option
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6b95068bbebd9cf76346fe2365d0d862390f7a4c

Comment by Peter Jones [ 25/Apr/16 ]

Thanks Li Xi

Comment by Andreas Dilger [ 28/Apr/16 ]

Before we land a patch like this, I think there needs to be a clear benefit to having it in the first place.

Enabling group lock on every file in a subtree mount could potentially cause a lot of problems for applications running there, since it means that there will be no cache invalidation between clients, and writes from one client will not be seen by another client without an explicit sync() on the file. It isn't even clear if closing the file will cause the data to be flushed or not. Also, any files in this subtree may be permanently inaccessable to other clients that are not mounting with this same grouplock option, if one of the grouplock clients holds the file open, which will cause lock timeouts on the other clients and other potential problems.

Since the main motivation for such a patch is performance due to reduced DLM locking, have you measured some performance improvement from using this patch? If yes, what kind of workload was run to see this performance improvement, and is there some other way to get this improvement with the current DLM without the need to have explicit mount options for such subtrees? That would benefit all Lustre clients, instead of the small subset that would be mounted with the grouplock option.

Comment by Li Xi (Inactive) [ 29/Apr/16 ]

I totally agree that we shouldn't merge this patch in a rush without fully
understanding the benefits and problems.

We currently don't have any application which is using grouplock. I myself, is
wondering how much performance improvement could be achieved by using
grouplocks. It is not easy to make a clear conclusion because the performance
improvement is highly correlated with the workloads. That means, we can't use
any existing benchmark tools to test gouplock. The results of some bechmarks
might be disappointing (if, as I am expecting, LDLM is highly optimized).

However, I do believe that there is some workloads which can be accelerated by
grouplock. A perfect place of using grouplock is I/O forwarding system or I/O
proxy, or any middle ware between acess nodes (or compute nodes) and Lustre
clients. Because Lustre is kind of wrapped by the middle ware, the potential
problems of grouplock can be avoided in many ways. And we are seeing more kinds
of middle wares in front of Lustre, including I/O forwarding systems, burst
buffers, hadoop, openstack, or even qemu. In these use cases, the I/O patterns
might not be traditional. And in order to better support those patterns, Lustre
could probably don't need to provide POSIX semantics. For example, if we want
to store KVM images on Lustre, we probaly don't need the data to be synced in
in real time between clients at all.

Of course, those use cases are not traditional fields that Lustre is focusing
on. Lustre is doing great in HPC, but is used less in other fields. In my
opinion, being used by more users in more domains always bring benefits.

Obviously, grouplock won't be useful for all of those cases. Its use cases
might be very limited. That is why I am looking for other mechanisms, such as
fscache/cachefiles support for Lustre. And Lustre with fscache seems also
limited, which can only be used as read cache of dataas far as I can see. I am
wondering whether there is any way to build a writable cache level on Lustre
client which is not necessarily be based on memory pages, yet can cache both
the metadata and data of a Lustre subtree, and buffer all the modifications in
that subtree. That might be more helpful for some use cases. And it is still
worth to do so even if semantics or interfaces of Lustre need to be changed.

Comment by Andreas Dilger [ 02/Nov/17 ]

Li Xi, is this work replaced by the LCOC/PCC feature? Otherwise, I think that using group lock is not really the right way to fix this problem. Group lock still needs to get a DLM lock from the server, so it only makes a difference when there are multiple clients accessing the same file, to avoid lock contention.

Comment by Li Xi (Inactive) [ 12/Jan/18 ]

Hi Andreas,

Yeah, I think PCC might be better solution for some of the use cases mentioned in this ticket. And Rreadonly-PCC uses grouplock. So I am closing this ticket.

It was good to discuss though. Thanks!

Generated at Sat Feb 10 02:14:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.