Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6545

MPIIO short reads

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • Lustre 2.4.3
    • None
    • Client: 2.4.3
      server: 2.4.3
    • 3
    • 9223372036854775807

      One of our filesystem is experiencing, what we guess are, short reads that results in NaNs using mpiio function call 'MPI_FILE_READ_AT_ALL'

      This can be reproduced every time if the data is read from disk and not cache. So doing a echo 1 > /proc/sys/vm/drop_caches then running the code will error every time, but running the code a second or third time will not produce the error.

      NOTE:
      This occurs only when the file is striped >1 ost.
      In the debug logs the datafile has a fid of [0x2000b2ebc:0x358:0x0]
      During the debugging I disabled read ahead

      I have captured a full debug trace of lustre on the client and will upload to tftp site.

            bobijam Zhenyu Xu
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: