[LU-1974] File corruptions when running with LU-1442 patch, LU-1703 patch is also required. Created: 18/Sep/12 Updated: 19/Nov/12 Resolved: 19/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexandre Louvet | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6320 |
| Description |
|
This bug is mainly open for information and to let the community aware, just in case ... So, here is the story, running with our/Bull Lustre 2.1.2 version, customer started to report files corruptions where a Client can successfully create/write/re-read files until an other Client tries to access the same file. At this time the files content became corrupted for all (missing blocks of data or zéro file-size ...) !! On the other hand, the corruptions have been identified to occur only on OSTs/OSCs where the file-creator Client had no more grant ("/proc/fs/lustre/osc/<OST-import>/cur_ {dirty|grant|lost_grant}_bytes" are Null) and also they seem to never be automatically recovered but only when we run a small program doing O_DIRECT writes to these OSTs ... Finally doing a full Lustre-trace of a program/command on a Client writing to these "zero-grant" OSTs, we found that the -EDQUOT was coming up during the cached-write standard path/routines and then the direct-IO path was attempted but ended with -EALREADY finally substitued with a successfull Null/0 return-value and no page written/flushed to the Server at all nor new grants recovered !!! Having a look to the concerned source-code, this would only occur if the written page(s) was not set Dirty ... Finally we found that this behavior/bug (missing "set_page_dirty()" vvp_io_commit_write() in case of -EDQUOT return from cl_page_add_cache() and implicit switch to direct-IO path/vvp_page_sync_io()) was not in Lustre v2.1.2 Base but has been introduced by patch from This bug has been since fixed by Also, I think that an explicit link/comment has to be added in |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 18/Sep/12 ] |
|
glad you have found the root cause. |
| Comment by Peter Jones [ 18/Sep/12 ] |
|
Thanks Bruno. I have added a link between the tickets. Both of these fixes are included in 2.1.3 so hopefully this is not a widespread issue. Do you need any further action or can we close this ticket? |
| Comment by Bruno Faccini (Inactive) [ 19/Sep/12 ] |
|
Yes, for sure ticket can be closed, I don't there are anything else to be done. |