Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18071

Single client job -o flock zombie flock remains on file, -o local flock works fine

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.15.5
    • None
    • 3
    • 9223372036854775807

    Description

      Single node mpi data analysis job (8 cpu) running on client. Job completes fine but subsequent runs fail because a file in the dataset remains locked. A remount clears file lock.

      After application on Client of LU-17692 and LU-17589 the below error message is issued on the client after the job runs successfully and then the file lock status is checked.

      00000080:00020000:7.0F:1722023688.289109:0:2533:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2318/2533, start=0/0, end=0/9223372036854775807,type=0/2

      Remounting with -o localflock the job runs fine and no zombie flock remains. 

      Back to mounting on client '-o flock' and re-running application job. File remains locked. Any attempt to release the lock triggers the "Flock LS mismatch error" on the client.

      00000080:00020000:7.0:1722024764.304245:0:3100:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2778/3100, start=0/0, end=0/9223372036854775807,type=0/2
      00000080:00020000:7.0:1722025303.376488:0:3179:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2778/3179, start=0/0, end=0/9223372036854775807,type=0/2
      Debug log: 390 lines, 390 kept, 0 dropped, 0 bad.
      00000080:00020000:7.0F:1722025407.995185:0:3184:0:(file.c:4899:ll_file_flock()) Flock LR mismatch! inode=[0x200000bd2:0xae:0x0], flags=0x80000, mode=2, pid=2778/3184, start=0/0, end=0/9223372036854775807,type=0/2
      Debug log: 1 lines, 1 kept, 0 dropped, 0 bad.

      Also used a test program to check the lock (attached).

       

      Attachments

        1. wholocked.c
          3 kB
        2. r2u31n1-ldebug.txt
          8 kB
        3. r2u05n1-ldebug.txt
          19 kB
        4. LU18071-dk-server.txt
          720 kB
        5. LU18071-dk-client.txt
          11 kB
        6. debug02.flock.4core.txt
          16.05 MB

        Activity

          People

            wc-triage WC Triage
            aeonjeff Jeff Johnson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: