[LU-1380] Potential problem with compiles using Lustre file systems Created: 05/May/12 Updated: 29/May/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.x (1.8.0 - 1.8.5) |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Joe Mervini | Assignee: | Oleg Drokin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Redsky (Sun Blade) cluster using both mdadm based and DDN 9900 based lustre file systems. |
||
| Severity: | 3 |
| Rank (Obsolete): | 10248 |
| Description |
|
One of our users is making the following claim. I would like to know if there is any validity to his claim: A new created file on a lustre filesystem has its write lock removed before the contents of the file have appeared (likely due to lustre, for performance reasons, delaying writing the contents of the file to service other higher priority requests). We've seen the erroneous behavior in multiple contexts. In the example problem I sent, the Intel compiler had just finished generating an object file (ichos.o) on /gscratch2 and the next step was to invoke the linker to use that object file to link an executable (ichos.exe). The Intel compiler waited until the file system returned indicating that file ichos.o had been written to the disk and the write lock had been removed. The linker immediately fired up and tried to read this file, ichos.o, but it was still being written to the file system concurrent with the linker trying to read the file from the file system, hence the internal error. Other contexts where we've seen this exact same problem is copying large numbers of smaller files (order 1k byte sized) across file systems (writing to a lustre filesystem) and, bizarrely, one or two files will be created with size zero. The source copy of the file was fine, the data was never put into the target file though the target file was created. The only difference here is that the file contents never show up, in the Intel compiler case above, the file contents are written some period of time after the files write lock has been released. |
| Comments |
| Comment by Andreas Dilger [ 05/May/12 ] |
|
The behaviour described here should never happen with Lustre. When a client is writing to the file, the client does get a write lock on the file, but it cannot be dropped while there is dirty data in the client cache. Also, unless there is contention on this file from another node (e.g. parallel compile on multiple nodes?) there is little incentive for the client do drop the lock and cached pages. The client should still be able to red newly-written file data. |
| Comment by Peter Jones [ 06/May/12 ] |
|
Oleg Could you please comment on this one? Thanks Peter |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |