[LU-10885] enable flock by default Created: 06/Apr/18  Updated: 31/Aug/23  Resolved: 19/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: easy

Issue Links:
Related
is related to LU-8069 Allow remount to include "flock" and ... Open
is related to LU-12348 "flock" mount option should be enable... Closed
Rank (Obsolete): 9223372036854775807

 Description   

We should consider to enable the flock mount option by default, while still allowing localflock and noflock options for users that do not want this functionality.  From looking at issues reported on http://stackexchange.com/ and others, it seems that the lack of flock functionality by default is an obstacle for many users to use databases on top of Lustre.

If the users are not using flock functionality, I don't think this adds any overhead, and if they are using this functionality then they want it enabled in any case.



 Comments   
Comment by Patrick Farrell (Inactive) [ 17/Apr/18 ]

I would offer one further thought - I've talked to people a few times who said "oh localflock makes the not supported message go away, that's fine, I'll run my app like that", without any idea what they were doing, risking incorrect operation in their multi-node app.  "flock" on by default makes perfect sense to me.  (Cray has added it everywhere years and years now.)

Comment by Gerrit Updater [ 19/Apr/18 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/32091
Subject: LU-10885 llite: enable flock mount option by default
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 40cf09afb13c6c773ec6781a54059c3472c7f15d

Comment by Gerrit Updater [ 19/Apr/18 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/32092
Subject: LU-10885 tests: clean up flocks_test code style
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 49e278280cb133a2b1b7db1debe1cd93a9fcb967

Comment by Gerrit Updater [ 18/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32092/
Subject: LU-10885 tests: fix up flocks_test bugs and code style
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3ce41b7168f7a3b5bacb5ae35f278dce4a994fae

Comment by Gerrit Updater [ 18/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32091/
Subject: LU-10885 llite: enable flock mount option by default
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3613af3e15cbc6091e3a16c8caeb1307be2d91f6

Comment by Peter Jones [ 19/Feb/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 29/May/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34987
Subject: LU-10885 llite: enable flock mount option by default
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 20ed66f1a6aac7623185da70da20950d22f4c666

Comment by Gerrit Updater [ 03/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34987/
Subject: LU-10885 llite: enable flock mount option by default
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 16fb13eb386380a4eb46b7e016a66cb38a01f54f

Comment by Aurelien Degremont (Inactive) [ 16/Jul/19 ]

How this fit with this paragraph from Lustre wiki:

flock: enable support for cluster-wide, coherent file locks. Must be applied to the mount commands for all clients that will be accessing common data requiring lock functionality. Cluster-wide locking will have a detrimental impact on file system performance, and should only be enabled when absolutely required. For some applications, the locking is only necessary on a sub-set of nodes. For example, the CTDB cluster framework used by Samba to provide a parallel, high-availability SMB gateway, relies on locking of a shared file when coordinating cluster start-up and recovery. However, only the CTDB nodes need to mount the Lustre file system with the flock option. This is an example of application or domain-specific lock requirements.

If you really think there is very limited performance impact, we should probably revise this wiki page.

Comment by Patrick Farrell (Inactive) [ 16/Jul/19 ]

Oh, jeez - I'm not sure when that was written, but it's completely wrong.  It has no measurable impact on performance at all, unless you've got an app that ends up contending those locks between nodes, in which case it is asking for mutual exclusion, and we are only following its requests.  The only case where you'd see a performance hit is in the case of an app making heavy use of locks it doesn't need.  (I am not considering the case where the app makes heavy use of the locks for good reason.  In that case, it needs them for correct operation.)

degremoa, can you link the wiki page or clean it up yourself?  It's just totally wrong.

Comment by Aurelien Degremont (Inactive) [ 16/Jul/19 ]

Here is the page: http://wiki.lustre.org/Mounting_a_Lustre_File_System_on_Client_Nodes

Probably better you fix it with the wording you think is appropriate.

 

For my own knowledge, I've never looked at how flock is implemented, but enabling flock does not change at all how locking is working for any other Lustre resources? This only has impact for users explicitly calling flock on their files?

Comment by Patrick Farrell (Inactive) [ 16/Jul/19 ]

Correct - flock is an independent type of MDT lock (it's sort of like if there was a flock bit in the IBITS bit set, but it's not implemented that way), and flocks only interact with other flocks.  So only for those users.  They don't interact at all with other locks - Don't conflict, etc.

I'll see if I can still edit the wiki.

Comment by Andreas Dilger [ 16/Jul/19 ]

The flock locks are implemented via DLM on the MDS, but in a separate lock namespace from IBITS locks used for regular files. They are only used of flock is requested by the application, so should have no impact if the application is not using this feature.

Comment by Aurelien Degremont (Inactive) [ 16/Jul/19 ]

Thanks a lot for the explanations!

Comment by Patrick Farrell (Inactive) [ 16/Jul/19 ]

Thank you for pointing out the wiki - I've updated it to give users a strong push towards 'flock'.  (localflock has always seemed like a way for ambitious users to corrupt their data)

One note: I don't actually know when we switched from whatever flock implementation had a performance cost, but it was a long time ago, so I just settled on "2.x is fine".  (It's definitely true of 2.4/2.5, as well as at least the Seagate 2.1, so I'm pretty sure it's correct.)

Comment by Gerrit Updater [ 15/Aug/23 ]

"Laura Hild <lsh@jlab.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51948
Subject: LU-10885 docs: note flock now being enabled by default
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 10701dd9101831bbe9521705d2a3754560eaa920

Comment by Gerrit Updater [ 31/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51948/
Subject: LU-10885 docs: note flock now being enabled by default
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b557fb21c8dfa676fd4ec528fed3d8ea17bc665f

Generated at Sat Feb 10 02:39:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.