[LU-11562] replay-single test 87a fails with 'New checksum d41d8cd98f00b204e9800998ecf8427e does not match original X' Created: 23/Oct/18  Updated: 09/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.10.3, Lustre 2.10.5, Lustre 2.13.0, Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10702 replay-single test_87a: checksum does... Resolved
is related to LU-11561 Change syncjournal back to sync_journal Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

replay-single test_87a fails a check of file checksums before and after an OST failover. Looking at the client test_log for https://testing.whamcloud.com/test_sets/193c29e2-d05c-11e8-82f2-52540065bddc, we can see that the OSTs are not showing up in the output of 'lfs df'

== replay-single test 87a: write replay ============================================================== 00:30:41 (1539588641)
CMD: onyx-39vm10 lctl set_param -n obdfilter.lustre-OST0000.sync_journal 0
onyx-39vm10: error: set_param: param_path 'obdfilter/lustre-OST0000/sync_journal': No such file or directory
CMD: onyx-39vm10 sync; sync; sync
UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID      5825660       47548     5255280   1% /mnt/lustre[MDT:0]

filesystem_summary:            0           0           0   0% /mnt/lustre

When this test passes, we see the OSTs listed in 'lfs df'. More than likely a previous test did not clean up after itself. Note: The sync_journal parameter will be taken care of in LU-11561.

In the client log, it looks like we are able to write to the file successfully, but, after the OSS has failed over and when we go to read the file that we are calculating the checksum on, the dd doesn't read/write any data

8+0 records in
8+0 records out
8388608 bytes (8.4 MB, 8.0 MiB) copied, 0.531744 s, 15.8 MB/s
...
0+0 records in
0+0 records out
0 bytes copied, 0.00189394 s, 0.0 kB/s
 replay-single test_87a: @@@@@@ FAIL: New checksum d41d8cd98f00b204e9800998ecf8427e does not match original 51109c0cd52a9fa425c47b018ed0708e 

Whenever replay-single test 87a fails with the checksum mismatch, the "new checksum" is always the same; d41d8cd98f00b204e9800998ecf8427e . The original checksum is not the same over these failures.

We see this error at least as far back as May 2018.

More replay-single test_87a failures are at
https://testing.whamcloud.com/test_sets/f3e40540-d13f-11e8-ad90-52540065bddc
https://testing.whamcloud.com/test_sets/7ad86b32-cae1-11e8-b589-52540065bddc
https://testing.whamcloud.com/test_sets/94e8844a-c8a1-11e8-82f2-52540065bddc
https://testing.whamcloud.com/test_sets/fc93fec0-c8a2-11e8-82f2-52540065bddc
https://testing.whamcloud.com/test_sets/1bedfd14-b068-11e8-bbd1-52540065bddc



 Comments   
Comment by Jian Yu [ 23/Aug/19 ]

+1 on master branch: https://testing.whamcloud.com/test_sets/f365e340-c5e5-11e9-98c8-52540065bddc

Comment by Bruno Faccini (Inactive) [ 31/Oct/19 ]

+1 on master @https://testing.whamcloud.com/test_sessions/431dfbf0-778f-46c2-84d5-e61058e669d6

Comment by Andreas Dilger [ 04/Sep/20 ]

+1 on master https://testing.whamcloud.com/test_sets/2d6b7675-3c63-40cd-975b-d856f6e8daf0

Comment by Andreas Dilger [ 14/Dec/21 ]

+1 on master: https://testing.whamcloud.com/test_sets/0b6cbfe1-ab6f-4ab1-adea-6d8d959d05f9

Comment by Nikitas Angelinas [ 31/Oct/23 ]

+1 on master: https://testing.whamcloud.com/test_sets/4176e033-ca9e-440d-914d-cbadc029566e

Generated at Sat Feb 10 02:44:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.