Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
Lustre 2.13.0
-
None
-
master
-
2
-
9223372036854775807
Description
Got the following error with IOR. this is ior_hard_read (non-aligned single shared file) workload from multiple clients (10N240P)
IOR-3.3.0+dev: MPI Coordinated Test of Parallel I/O Began : Thu Sep 19 12:27:39 2019 Command line : /work/BMLab/Lustre/io500/io-500-dev/bin/ior -r -R -s 132000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /scratch0/io500.out/ior_hard/IOR_file -O stoneWallingStatusFile=/scratch0/io500.out/ior_hard/stonewall Machine : Linux c11 TestID : 0 StartTime : Thu Sep 19 12:27:39 2019 Path : /scratch0/io500.out/ior_hard FS : 52.4 TiB Used FS: 22.3% Inodes: 423.7 Mi Used Inodes: 14.9% Options: api : POSIX apiVersion : test filename : /scratch0/io500.out/ior_hard/IOR_file access : single-shared-file type : independent segments : 132000 ordering in a file : sequential ordering inter file : constant task offset task offset : 1 tasks : 240 clients per node : 24 repetitions : 1 xfersize : 47008 bytes blocksize : 47008 bytes aggregate filesize : 1.35 TiB Results: access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---------- --------- -------- -------- -------- -------- ---- [14] FAILED comparison of buffer containing 8-byte ints: [14] File name = /scratch0/io500.out/ior_hard/IOR_file [14] In transfer 0, 5120 errors between buffer indices 328 and 5447. [14] File byte offset = 508297412608: [14] Expected: 0x000000260000001b 0000000000000a48 000000260000001b 0000000000000a58 [14] Actual: 0x0000000000000000 0000000000000000 0000000000000000 0000000000000000 [27] FAILED comparison of buffer containing 8-byte ints: [27] File name = /scratch0/io500.out/ior_hard/IOR_file [27] In transfer 0, 5668 errors between buffer indices 0 and 5667. [27] File byte offset = 542967361248: [27] Expected: 0x000000330000001b 0000000000000008 000000330000001b 0000000000000018 [27] Actual: 0x000000890000001b 0000000000006ee8 000000890000001b 0000000000006ef8 WARNING: incorrect data on read (10788 errors found). Used Time Stamp 27 (0x1b) for Data Signature read 29032 45.91 45.91 0.112576 48.89 0.109625 48.92 0 Max Read: 29031.84 MiB/sec (30442.09 MB/sec)
Even it's tested on several times using different clients, IOR claims same mpi rank reads incorrect data from expected.
it seems to be it happens at write and incorrect data was stored in the file.
Overstriping is enabled on this directory. set 240 OverStriping against 8 OSTs.
# lfs getstripe /scratch0/io500.out/ior_hard /scratch0/io500.out/ior_hard stripe_count: 240 stripe_size: 16777216 pattern: raid0,overstriped stripe_offset: -1 /scratch0/io500.out/ior_hard/stonewall lmm_stripe_count: 240 lmm_stripe_size: 16777216 lmm_pattern: raid0,overstriped lmm_layout_gen: 0 lmm_stripe_offset: 2 obdidx objid objid group