Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11844

IO pattern causing writes to hang to OST

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.10.4
    • Fix Version/s: None
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      We have a fortran code that does small writes to a status file. This code worked well on Lustre 2.5.x. On Lustre 2.10.4 we’re experiencing a bug with the same code where one of the writes will hang for about 20 seconds. During that period all writes to the affect OST will hang. Io to other OSTs will work, and io from other nodes are unaffected.

       

      The strace of the application shows that it opens a file, truncates to 0, writes a char, tuncates to 1, and continues writing,closes then repeats. After a few cycles the write that passes 4k offset into the file will hang for about 30 seconds.

       

      The strace looks like this:

      open("short_file", O_RDWR|O_CREAT, 0700) = 3

      ftruncate(3, 0) = 0

      write(3, "/", 1) = 1

      ftruncate(3, 1) = 0

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130) = 130

      write(3, "atHUpw+orbPHuU55Em+XvliyYOwQg2le"..., 130 ← Hangs Here

       

      The Lustre debug traces from the client shows the hang and indicates multiple calls to genops.c:1990:obd_stale_export_get. Attached are a reproducer, and the lctl dk output after issuing: echo "trace nettrace dlmtrace rpctrace vfstrace" > /proc/sys/lnet/debug

       

       

        Attachments

        1. repro.c
          0.6 kB
        2. dk_oss.txt
          2.28 MB
        3. dk_mds.txt
          4.08 MB
        4. dk_client.txt
          5.53 MB

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                apargal Alex Parga
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: