Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12786

io hard read fails due to data verification fails

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Not a Bug
    • Affects Version/s: Lustre 2.13.0
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      master
    • Severity:
      2
    • Rank (Obsolete):
      9223372036854775807

      Description

      Got the following error with IOR. this is ior_hard_read (non-aligned single shared file) workload from multiple clients (10N240P)

      IOR-3.3.0+dev: MPI Coordinated Test of Parallel I/O
      Began               : Thu Sep 19 12:27:39 2019
      Command line        : /work/BMLab/Lustre/io500/io-500-dev/bin/ior -r -R -s 132000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /scratch0/io500.out/ior_hard/IOR_file -O stoneWallingStatusFile=/scratch0/io500.out/ior_hard/stonewall
      Machine             : Linux c11
      TestID              : 0
      StartTime           : Thu Sep 19 12:27:39 2019
      Path                : /scratch0/io500.out/ior_hard
      FS                  : 52.4 TiB   Used FS: 22.3%   Inodes: 423.7 Mi   Used Inodes: 14.9%
      
      Options: 
      api                 : POSIX
      apiVersion          : 
      test filename       : /scratch0/io500.out/ior_hard/IOR_file
      access              : single-shared-file
      type                : independent
      segments            : 132000
      ordering in a file  : sequential
      ordering inter file : constant task offset
      task offset         : 1
      tasks               : 240
      clients per node    : 24
      repetitions         : 1
      xfersize            : 47008 bytes
      blocksize           : 47008 bytes
      aggregate filesize  : 1.35 TiB
      
      Results: 
      
      access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s) iter
      ------    ---------  ---------- ---------  --------   --------   --------   -------- ----
      [14] FAILED comparison of buffer containing 8-byte ints:
      [14]   File name = /scratch0/io500.out/ior_hard/IOR_file
      [14]   In transfer 0, 5120 errors between buffer indices 328 and 5447.
      [14]   File byte offset = 508297412608:
      [14]     Expected: 0x000000260000001b 0000000000000a48 000000260000001b 0000000000000a58 
      [14]     Actual:   0x0000000000000000 0000000000000000 0000000000000000 0000000000000000 
      [27] FAILED comparison of buffer containing 8-byte ints:
      [27]   File name = /scratch0/io500.out/ior_hard/IOR_file
      [27]   In transfer 0, 5668 errors between buffer indices 0 and 5667.
      [27]   File byte offset = 542967361248:
      [27]     Expected: 0x000000330000001b 0000000000000008 000000330000001b 0000000000000018 
      [27]     Actual:   0x000000890000001b 0000000000006ee8 000000890000001b 0000000000006ef8 
      WARNING: incorrect data on read (10788 errors found).
      Used Time Stamp 27 (0x1b) for Data Signature
      read      29032      45.91      45.91      0.112576   48.89      0.109625   48.92      0   
      Max Read:  29031.84 MiB/sec (30442.09 MB/sec)
      

       Even it's tested on several times using different clients, IOR claims same mpi rank reads incorrect data from expected.
      it seems to be it happens at write and incorrect data was stored in the file.

      Overstriping is enabled on this directory. set 240 OverStriping against 8 OSTs.

      # lfs getstripe /scratch0/io500.out/ior_hard           
      /scratch0/io500.out/ior_hard
      stripe_count:  240 stripe_size:   16777216 pattern:       raid0,overstriped stripe_offset: -1
      
      /scratch0/io500.out/ior_hard/stonewall
      lmm_stripe_count:  240
      lmm_stripe_size:   16777216
      lmm_pattern:       raid0,overstriped
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      	obdidx		 objid		 objid		 group
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dongyang Dongyang Li
                Reporter:
                sihara Shuichi Ihara
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: