Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
http://github.com/chaos/lustre, version 2.1.1-11chaos
-
3
-
6399
Description
I've been getting widespread reports that with 2.1 clients users are seeing random ENOENT errors on opens (and maybe stats?).
Sometimes the file is written, closed, and reopened on the same client node. But the open will report that the file does not exist. A few minutes later the file is definitely there, so the problem is transitory.
We have also had instances of this where the ENOENT occurs on a node other than where the file was created. One node will create, write, and close the file, and then another will attempt to open it only to get ENOENT.
Here is an example failure from a simul test:
09:04:12: Set iteration 4 09:04:12: Running test #0(iter 0): open, shared mode. 09:04:12: Beginning setup 09:04:12: Finished setup (0.001 sec) 09:04:12: Beginning test 09:04:12: Process 177(hype338): FAILED in simul_open, open failed: No such file or directory
There tend to not be any obvious messages in the console logs associated with these events.