[LU-12022] sanity-flr: test_200 'checksum error for mirror 3' Created: 26/Feb/19 Updated: 21/Dec/22 Resolved: 22/Jul/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.1, Lustre 2.14.0, Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for paf <pfarrell@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/39847bcc-3985-11e9-8f69-52540065bddc Error given is checksum error, but mirror resync just failed entirely. Test should probably be updated to catch the failure there rather than report a checksum error later: lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed |
| Comments |
| Comment by Minh Diep [ 03/Apr/19 ] |
|
+1 on b2_12: https://testing.whamcloud.com/test_sets/1b46d18c-552d-11e9-9720-52540065bddc |
| Comment by James Nunez (Inactive) [ 09/Aug/19 ] |
|
Unfortunately, resync can fail and the test is still marked as PASS. I'll create a patch to fail and exit when we encounter a failed resync. See https://testing.whamcloud.com/sub_tests/2a53cf5e-b65d-11e9-b753-52540065bddc for sanity-flr test 200 marked as PASS, but all resync tests print "failed" == sanity-flr test 200: stress test ================================================================== 07:24:23 (1564842263) Starting client: trevis-19vm1: -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre2 CMD: trevis-19vm1 mkdir -p /mnt/lustre2 CMD: trevis-19vm1 mount -t lustre -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre2 Starting client: trevis-19vm1: -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre3 CMD: trevis-19vm1 mkdir -p /mnt/lustre3 CMD: trevis-19vm1 mount -t lustre -o user_xattr,flock trevis-19vm4@tcp:/lustre /mnt/lustre3 fail_loc=0x1A03 CMD: trevis-19vm4 /usr/sbin/lctl set_param fail_loc=0x1A03 fail_loc=0x1A03 lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..Extending file size to 4434464 .. Extending file size to 8121536 .. failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed Extending file size to 8432928 .. resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..Extending file size to 8678144 .. failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed Extending file size to 8785152 .. resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed Extending file size to 9138848 .. lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..Extending file size to 8838944 .. done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed Extending file size to 8901856 .. resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed Extending file size to 9033472 .. lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..Extending file size to 9341920 .. failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..failed lock to resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e delay_before_copy -d 1' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..done |
| Comment by Gerrit Updater [ 09/Aug/19 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35754 |
| Comment by James Nunez (Inactive) [ 29/Aug/19 ] |
|
Note: The patch https://review.whamcloud.com/35754 does not fix the issue as to why sanity-flr is failing. The patch is to make it very clear when the test fails. |
| Comment by Andreas Dilger [ 18/May/21 ] |
|
+1 for master https://testing.whamcloud.com/test_sets/2d0931a5-bdef-4565-8ba6-ed24fd0392e9 |
| Comment by Andreas Dilger [ 14/Jun/21 ] |
|
James, I don't think that the resync errors should actually be considered test failures. If the file changes while the resync is happening, then the resync would be aborted and need to be done again. That's just how FLR currently is implemented. However, the resync at the end of the test (after the write threads have been stopped) should properly resync the stale mirrors. It isn't clear why this test is still using "mirror_io resync" instead of "lfs mirror resync", since the latter is the tool that is used in production and is the tool we care is working properly. The mirror_io tool was a temporary FLR development tool, and its use and code should probably be removed (I am not aware of any functionality it has that is not available via "lfs mirror", but if there is we should consider moving it over. |
| Comment by Gerrit Updater [ 22/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35754/ |
| Comment by Peter Jones [ 22/Jul/21 ] |
|
Landed for 2.15 |
| Comment by Sebastien Buisson [ 18/Aug/21 ] |
|
Seen again here: |
| Comment by Patrick Farrell [ 20/Dec/21 ] |
|
+1 |