Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.9.0
-
2
-
9223372036854775807
Description
This is a duplicate of LU-9208. Opening this case for tracking for nasa. We start to see this once we updated the clients to Sles12SP2 and lustre2.9
Using the test code provide LU-9208 (miranda) I was able to reproduce the bug on a single node.
Iteration = 868, Run Time = 0.9614 sec., Transfer Rate = 120.7790 10e+06 Bytes/sec/proc Iteration = 869, Run Time = 1.5308 sec., Transfer Rate = 75.8561 10e+06 Bytes/sec/proc forrtl: severe (121): Cannot access current working directory for unit 10012, file "Unknown" Image PC Routine Line Source miranda 0000000000409F29 Unknown Unknown Unknown miranda 00000000004169D2 Unknown Unknown Unknown miranda 0000000000404045 Unknown Unknown Unknown miranda 0000000000402FDE Unknown Unknown Unknown libc.so.6 00002AAAAB5B96E5 Unknown Unknown Unknown miranda 0000000000402EE9 Unknown Unknown Unknown MPT ERROR: MPI_COMM_WORLD rank 12 has terminated without calling MPI_Finalize() aborting job
I was able to capture some debug logs I have attached to the case. I was unable to reproduce it using "+trace". But will continue to try.