Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13899

IOR data corruption detected during automated lnet fofb testing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Seeing data corruption on two IOR jobs while running lnet failover test with " Regression write/verify DNE2 PFL". Both the failures showup data from immediate previous run. Likely cache related issue. fsync related warnings are also logged.

      CL_IOR_all_wr_20iter_1666K_rand

       access             = file-per-process
       	pattern            = strided (33 segments)
       	ordering in a file = random offsets
       	ordering inter file=constant task offsets = 1
       	clients            = 48 (4 per node)
       	repetitions        = 20
       	xfersize           = 1.63 MiB
       	blocksize          = 27.66 MiB
       	aggregate filesize = 42.78 GiB
      
       read      1316.54    28322      1666.00    0.001729   33.27      0.010429   33.28      16   XXCEL
       Using Time Stamp 1594461626 (0x5f098dba) for Data Signature
       Commencing write performance test.
       Sat Jul 11 05:00:26 2020read      1852.87    28322      1666.00    0.001450   23.64      0.004490   23.64      17   XXCEL
       Using Time Stamp 1594461701 (0x5f098e05) for Data Signature
       Commencing write performance test.
       Sat Jul 11 05:01:41 2020WARNING: cannot perform fsync on file.
       WARNING: cannot perform fsync on file.
       write     197.50     28322      1666.00    0.001447   221.82     0.421875   221.82     18   XXCEL
       Verifying contents of the file(s) just written.
       Sat Jul 11 05:05:23 2020 [6] At transfer buffer #3, index #0 (file byte offset 235425792):
       [6] Expected: 0x0000000e5f098e05
       [6] Actual:   0x0000000e5f098dba
       [6] At transfer buffer #3, index #2 (file byte offset 235425808):
       [6] Expected: 0x0000000e5f098e05
       [6] Actual:   0x0000000e5f098dbaIOR job: -a POSIX -i 20 -w -r -W -t 1666K -b 28322K -C -e -k -vv -E -F -q -s 33 -x -z
      
      
      File in question:/lus/snx11205/ostest.vers/alsorun.20200711032404.2543.pollux-p4/CL_IOR_all_wr_20iter_1666K_rand.2.9t3jzC.1594459731/CL_IOR_all_wr_20iter_1666K_rand/IORfile.00000004 
      

      Attachments

        Activity

          People

            zam Alexander Zarochentsev
            zam Alexander Zarochentsev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: