Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.14.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Seeing data corruption on two IOR jobs while running lnet failover test with " Regression write/verify DNE2 PFL". Both the failures showup data from immediate previous run. Likely cache related issue. fsync related warnings are also logged.

CL_IOR_all_wr_20iter_1666K_rand

 access             = file-per-process
 	pattern            = strided (33 segments)
 	ordering in a file = random offsets
 	ordering inter file=constant task offsets = 1
 	clients            = 48 (4 per node)
 	repetitions        = 20
 	xfersize           = 1.63 MiB
 	blocksize          = 27.66 MiB
 	aggregate filesize = 42.78 GiB

 read      1316.54    28322      1666.00    0.001729   33.27      0.010429   33.28      16   XXCEL
 Using Time Stamp 1594461626 (0x5f098dba) for Data Signature
 Commencing write performance test.
 Sat Jul 11 05:00:26 2020read      1852.87    28322      1666.00    0.001450   23.64      0.004490   23.64      17   XXCEL
 Using Time Stamp 1594461701 (0x5f098e05) for Data Signature
 Commencing write performance test.
 Sat Jul 11 05:01:41 2020WARNING: cannot perform fsync on file.
 WARNING: cannot perform fsync on file.
 write     197.50     28322      1666.00    0.001447   221.82     0.421875   221.82     18   XXCEL
 Verifying contents of the file(s) just written.
 Sat Jul 11 05:05:23 2020 [6] At transfer buffer #3, index #0 (file byte offset 235425792):
 [6] Expected: 0x0000000e5f098e05
 [6] Actual:   0x0000000e5f098dba
 [6] At transfer buffer #3, index #2 (file byte offset 235425808):
 [6] Expected: 0x0000000e5f098e05
 [6] Actual:   0x0000000e5f098dbaIOR job: -a POSIX -i 20 -w -r -W -t 1666K -b 28322K -C -e -k -vv -E -F -q -s 33 -x -z


File in question:/lus/snx11205/ostest.vers/alsorun.20200711032404.2543.pollux-p4/CL_IOR_all_wr_20iter_1666K_rand.2.9t3jzC.1594459731/CL_IOR_all_wr_20iter_1666K_rand/IORfile.00000004

Attachments

Activity

People

Assignee:: Alexander Zarochentsev

Reporter:: Alexander Zarochentsev

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Aug/20 5:49 PM

Updated:: 01/Sep/20 4:55 AM

Resolved:: 01/Sep/20 4:55 AM