[LU-6347] Radom 'forrtl: severe (39): error during read' Errors Created: 06/Mar/15 Updated: 30/Apr/15 Resolved: 30/Apr/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Hongchao Zhang |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 17768 |
| Description |
|
We have a user when they try to read some restart files the will get this fortran error message. The error can move around to different files and is not always consistent. this was the specific error and the FID of the file Reading file unit: 1232 forrtl: severe (39): error during read, unit 1232, file /nobackupp9/pbalakum/TURBULENCE/3D_TURBULENCE/TURB1_10595_DNS_COMPACT_512_512_512/fort.1232 Image PC Routine Line Source read_file 000000000047C351 Unknown Unknown Unknown read_file 000000000047B325 Unknown Unknown Unknown read_file 000000000043687A Unknown Unknown Unknown read_file 0000000000408872 Unknown Unknown Unknown read_file 00000000004080A1 Unknown Unknown Unknown read_file 000000000041949F Unknown Unknown Unknown read_file 0000000000402F5D Unknown Unknown Unknown read_file 0000000000402BFC Unknown Unknown Unknown libc.so.6 00007FFFED0F5C36 Unknown Unknown Unknown read_file 0000000000402AF9 Unknown Unknown Unknown r401i2n10 /nobackupp9/pbalakum/TURBULENCE/3D_TURBULENCE/TURB1_10595_DNS_COMPACT_512_512_512 # lfs path2fid /nobackupp9/pbalakum/TURBULENCE/3D_TURBULENCE/TURB1_10595_DNS_COMPACT_512_512_512/fort.1232 [0x20009c845:0x14ddf:0x0] I will upload the debug logs to ftp site and post the file |
| Comments |
| Comment by Mahmoud Hanafi [ 06/Mar/15 ] |
|
I uploaded debug logs to the following file ftp.whamcloud.com:uploads/LU6347/lu6347.tar. service161-service176 are the OSSes contents of lu6347.tar clientdebug.error.gz filerror.out.gz.service161 filerror.out.gz.service162 filerror.out.gz.service163 filerror.out.gz.service164 filerror.out.gz.service165 filerror.out.gz.service166 filerror.out.gz.service167 filerror.out.gz.service168 filerror.out.gz.service169 filerror.out.gz.service170 filerror.out.gz.service171 filerror.out.gz.service172 filerror.out.gz.service173 filerror.out.gz.service174 filerror.out.gz.service175 filerror.out.gz.service176 mdsdebug.out.bz2 |
| Comment by Peter Jones [ 08/Mar/15 ] |
|
Hongchao Could you please advise on this issue? Thanks Peter |
| Comment by Hongchao Zhang [ 09/Mar/15 ] |
|
Hi Mahmoud, Thanks for the detailed logs about this ticket! and I have checked it but I can't find the error which could cause Can the file offset be deduced from the log "Reading file unit: 1232 forrtl: severe (39): error during read, unit 1232" |
| Comment by Mahmoud Hanafi [ 11/Mar/15 ] |
|
We are not able to get any additional info from the user code or fortran lib. But we have intel compiler ticket with regards to this issue, number 6000089383. We would like for you to engage your compiler developers to help provide additional info about the specifics of the error. |
| Comment by Hongchao Zhang [ 12/Mar/15 ] |
|
Thanks, I will contact them for it. could you please copy the affected file [0x20009c845:0x14ddf:0x0] out of Lustre and check with the application to see |
| Comment by Mahmoud Hanafi [ 12/Mar/15 ] |
|
Key here is that it is not always the same file and the error doesn't happen every time on the same file. But I will try to see if I can reproduce it on a different filesystem. |
| Comment by Hongchao Zhang [ 13/Mar/15 ] |
|
Hi Mahmoud, could you please use the "GETLASTERROR" call mentioned in 6000089383 to get the actual error returned by OS? Thanks! |
| Comment by John Fuchs-Chesney (Inactive) [ 24/Apr/15 ] |
|
Hello Mahmoud, Do you need any more Lustre related work done on this ticket? If not then I would like to close it. Thanks, |
| Comment by Mahmoud Hanafi [ 30/Apr/15 ] |
|
This can be closed as we have |
| Comment by John Fuchs-Chesney (Inactive) [ 30/Apr/15 ] |
|
Thanks Mahmoud. |