Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.15.5
-
None
-
3
-
9223372036854775807
Description
Single node mpi data analysis job (8 cpu) running on client. Job completes fine but subsequent runs fail because a file in the dataset remains locked. A remount clears file lock.
After application on Client of LU-17692 and LU-17589 the below error message is issued on the client after the job runs successfully and then the file lock status is checked.
00000080:00020000:7.0F:1722023688.289109:0:2533:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2318/2533, start=0/0, end=0/9223372036854775807,type=0/2
Remounting with -o localflock the job runs fine and no zombie flock remains.
Back to mounting on client '-o flock' and re-running application job. File remains locked. Any attempt to release the lock triggers the "Flock LS mismatch error" on the client.
00000080:00020000:7.0:1722024764.304245:0:3100:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2778/3100, start=0/0, end=0/9223372036854775807,type=0/2 00000080:00020000:7.0:1722025303.376488:0:3179:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2778/3179, start=0/0, end=0/9223372036854775807,type=0/2 Debug log: 390 lines, 390 kept, 0 dropped, 0 bad. 00000080:00020000:7.0F:1722025407.995185:0:3184:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2778/3184, start=0/0, end=0/9223372036854775807,type=0/2 Debug log: 1 lines, 1 kept, 0 dropped, 0 bad.
Also used a test program to check the lock (attached).