Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
http://github.com/chaos/lustre, version 2.1.1-11chaos
-
3
-
6399
Description
I've been getting widespread reports that with 2.1 clients users are seeing random ENOENT errors on opens (and maybe stats?).
Sometimes the file is written, closed, and reopened on the same client node. But the open will report that the file does not exist. A few minutes later the file is definitely there, so the problem is transitory.
We have also had instances of this where the ENOENT occurs on a node other than where the file was created. One node will create, write, and close the file, and then another will attempt to open it only to get ENOENT.
Here is an example failure from a simul test:
09:04:12: Set iteration 4 09:04:12: Running test #0(iter 0): open, shared mode. 09:04:12: Beginning setup 09:04:12: Finished setup (0.001 sec) 09:04:12: Beginning test 09:04:12: Process 177(hype338): FAILED in simul_open, open failed: No such file or directory
There tend to not be any obvious messages in the console logs associated with these events.
Thanks Prakash. We will track landing this code under
LU-506so I am closing this ticket as a duplicate of that. As to whether this fix will also address the instances you may have observed prior to applying the intially flawedLU-1234patch, it may well do because the cache mechanism has been altered by this change.