Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.4.3
-
None
-
Client: 2.4.3
server: 2.4.3
-
3
-
9223372036854775807
Description
One of our filesystem is experiencing, what we guess are, short reads that results in NaNs using mpiio function call 'MPI_FILE_READ_AT_ALL'
This can be reproduced every time if the data is read from disk and not cache. So doing a echo 1 > /proc/sys/vm/drop_caches then running the code will error every time, but running the code a second or third time will not produce the error.
NOTE:
This occurs only when the file is striped >1 ost.
In the debug logs the datafile has a fid of [0x2000b2ebc:0x358:0x0]
During the debugging I disabled read ahead
I have captured a full debug trace of lustre on the client and will upload to tftp site.
Attachments
Issue Links
- is related to
-
LU-6389 read()/write() returning less than available bytes intermittently
- Resolved