Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6545

MPIIO short reads

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.4.3
    • None
    • Client: 2.4.3
      server: 2.4.3
    • 3
    • 9223372036854775807

    Description

      One of our filesystem is experiencing, what we guess are, short reads that results in NaNs using mpiio function call 'MPI_FILE_READ_AT_ALL'

      This can be reproduced every time if the data is read from disk and not cache. So doing a echo 1 > /proc/sys/vm/drop_caches then running the code will error every time, but running the code a second or third time will not produce the error.

      NOTE:
      This occurs only when the file is striped >1 ost.
      In the debug logs the datafile has a fid of [0x2000b2ebc:0x358:0x0]
      During the debugging I disabled read ahead

      I have captured a full debug trace of lustre on the client and will upload to tftp site.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: