[LU-15391] lustre group lock usage and restrictions Created: 22/Dec/21  Updated: 22/Aug/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Olaf Faaland Assignee: Patrick Farrell
Resolution: Unresolved Votes: 0
Labels: llnl

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

Suppose one has two processes, one on each of two nodes, and each open the same file and lock it via llapi_group_lock(3) using the same group, and the two processes then attempt to read and write to non-overlapping extents.

Since they're using group lock, I understand that neither node will be aware of the writes on the other node.  I would expect that this means that the extents that each node is "assigned" for writing need to be either page-aligned or even maybe stripe-aligned, and that not choosing extents this way would result in reads possibly returning stale data.

Is that correct?  What is the required alignment?  Are there other requirements?



 Comments   
Comment by Olaf Faaland [ 22/Dec/21 ]

I've submitted this as an LU-ticket instead of LUDOC- because I would think the best place to document any requirements would be in the llapi_group_lock(3) man page.  But if that's not the best place, then please modify the ticket appropriately.

Comment by Peter Jones [ 22/Dec/21 ]

Patrick

Could you please advise?

Thanks

Peter

Comment by Patrick Farrell [ 22/Dec/21 ]

Olaf,

You're absolutely right that we need to update the man page and this is a good context in which to do it.

You're also largely right about group locks.  The stale data issue is particularly significant - All nodes sharing a group lock have a full read/write lock on the file, so they consider any data they have to be the authoritative copy (because it's under a write lock!).

So if any part of the file is updated one place and then read in another, or read and then updated elsewhere, stale data can result as you'd expect.  The granularity for this is page granularity, so if I/Os 'overlap' at the level of pages, they are subject to this stale data risk.  And if two clients do partial updates of the same page, read-modify-write means various intermixings of new and old data are possible depending on timing.

So in general it's not safe to overlap any accesses (from different nodes - local to one node the page cache handles it and things work normally) with a write while holding a group lock.  Read-read is safe, but write-read, read-write, and even write-write cannot be safe.  (In the case of write-write, the issue is that you do not know if the first write is on disk when you start the second.  If you do page aligned writes, you can make write-write safe by using sync() calls.  But non-page-aligned writes include a read, so stale data issues apply (the read is to finish the partial page - Lustre always writes full pages to disk, so if a partial page write is attempted, it reads up the remainder of the page from disk first).)

One other thing - while a file is open with a group lock, the file size (the size shown by 'stat' or if you're using O_APPEND) may be inaccurate if the file has been extended or truncated.  Generally it would just be potentially stale, but there are various options.  In general size cannot be trusted if you're using a group lock and writing to the file.

One strategy I've seen used is to detect overlapping accesses in a userspace library, and when they are detected, to release the group lock on all nodes, then re-acquire it.  This syncs any dirty data to disk and clears the file from cache on all the nodes, so it's very expensive.

In general, if you page aligned your writes and only wrote to the file, you'd be fine.  Not much else is really safe except read only access, and if your access is read only, why use a group lock?

It would help me if you ask any questions or request any clarifications if this is unclear. (or even if it's confusing but you can figure it out, maybe point that out - Consider it a first review of some of the content and possible phrasing for the man page ).

Comment by Olaf Faaland [ 28/Dec/21 ]

Hi Patrick,

Thanks, this is very helpful.  What you wrote makes sense to me.  I had forgotten about lockahead, so I suggested that to the person who was interested in grouplock as a safer alternative.

-Olaf

Comment by Olaf Faaland [ 22/Aug/22 ]

Hi Patrick,

> The granularity for this is page granularity

In the case where the client and server have different page sizes, whose page size is the relevant one?

thanks,
Olaf

Generated at Sat Feb 10 03:17:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.