[LU-10681] Disable tiny writes for O_APPEND Created: 18/Feb/18  Updated: 13/Mar/18  Resolved: 08/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Blocker
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Unfortunately, tiny writes will not work correctly with O_APPEND. In short, this is because O_APPEND depends on LDLM locking to EOF (on all stripes/components) to protect the file size, but tiny writes requires only that the page we are writing to be locked.

This means the LDLM lock on a stripe not containing this page could be granted to another client and that client could extend the file without revoking our lock.

The simplest example is in a multiple stripes situation, but it could happen in a single stripe situation.

Client 1 writes to a page on stripe 1, dirtying it. File size is, for example, 2K. (not O_APPEND)
Client 2 writes to some part of the file on stripe 2, file size is now 1 MB + 2K. (not O_APPEND)
Client 1 does an O_APPEND write of 1K. The tiny writes code notices the page at expected file size is present, locks the full file locally, checks the size, notices it is 2K (because that's the locally known size) and writes there.

Data is not in the correct location. Ouch.

Two possible fixes:
1. Don't do tiny writes with O_APPEND
2. Do some sort of glimpse before every write
^-- This almost certainly removes the point of doing tiny writes, because it would be so slow. Better to do normal writes and take the full file locks required.

One further thought:
We could arrange for the size update code for tiny writes to check if the client has LDLM locks to EOF. If it is verified to have them after the start of the tiny write, we can be sure the file size on our client is up to date, or at least that it is not older than the start of the current write. This would still be faster than the normal write path, but much less than normal tiny writes.

I'll submit a patch. I will probably stick with the simplest solution - Don't do tiny writes for O_APPEND. I will explore the test locking option, but that may be a little complicated, and the more complicated the tiny writes code gets, the less benefit it has.

Note this problem is likely to be hit in the real world because O_APPEND writes are very often small.



 Comments   
Comment by Gerrit Updater [ 20/Feb/18 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/31353
Subject: LU-10681: Disable tiny writes for append
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6ff9be27f9c1dceeff2d44adcbe2de778db7bc8a

Comment by Gerrit Updater [ 08/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31353/
Subject: LU-10681: Disable tiny writes for append
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d79ffa3ff7461d8dcfb831f0024ed093a3f6f104

Comment by Peter Jones [ 08/Mar/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 13/Mar/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31633
Subject: LU-10681: Disable tiny writes for append
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 6708127aa2d747be0475d4bfabc8503e8ffcf43b

Generated at Sat Feb 10 02:37:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.