[LU-11697] BAD WRITE CHECKSUM with t10ip4K and t10ip512 checksums Created: 23/Nov/18 Updated: 04/Dec/18 Resolved: 04/Dec/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alex Zhuravlev | Assignee: | Li Xi |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
I'm running racer and get the following messages quite often. I think Oleg can confirm this. LustreError: 168-f: lustre-OST0000: BAD WRITE CHECKSUM: from 12345-0@lo inode [0x200000402:0x3b89:0x0] object 0x0:1921 extent [326-4194303]: client csum 53f1f04e, server csum 9bf3f057 |
| Comments |
| Comment by Peter Jones [ 23/Nov/18 ] |
|
Li Xi Could you please advise? Thanks Peter |
| Comment by Alex Zhuravlev [ 23/Nov/18 ] |
|
can't reproduce with crc32 |
| Comment by Andreas Dilger [ 23/Nov/18 ] |
|
The evictions are likely caused by the resends due to checksum errors. If adler/crc32 do not generate checksum errors then they do not resend and are not evicted. It may be that the RPC resends are not being counted by the OST as "forward progress" for a DLM lock being cancelled, so the client is evicted due to non-responsiveness? In that case, it makes sense to update the DLM lock timer due to the resends so that the client is not evicted if the network is having problems. |
| Comment by Andreas Dilger [ 23/Nov/18 ] |
|
According to Oleg, there are checksum errors with other RPC types as well when running racer. It isn't clear why this is only affecting Alex with t10ip checksums. Remove the 2.12 tag for now. If this becomes a problem in the field (which is unlikely if it only appears during racer), then a different checksum type can be selected. |
| Comment by Oleg Drokin [ 26/Nov/18 ] |
|
I have this problem too. I applied this patch and all the RPC checksums went away: diff --git a/lustre/include/obd_cksum.h b/lustre/include/obd_cksum.h
index a2ce2ec..bcb23e5 100644
--- a/lustre/include/obd_cksum.h
+++ b/lustre/include/obd_cksum.h
@@ -98,7 +98,7 @@ static inline enum cksum_types obd_cksum_types_supported_clien
ret |= OBD_CKSUM_CRC32;
/* Client support all kinds of T10 checksum */
- ret |= OBD_CKSUM_T10_ALL;
+ ret |= 0; //OBD_CKSUM_T10_ALL;
return ret;
}
Also without the patch above I am having sporadic patches in checksum that seems to imply there are problems in this area that are deeper: [80863.424036] BUG: unable to handle kernel paging request at ffff880229b1a000 [80863.476713] IP: [<ffffffff813ee500>] do_csum+0x70/0x180 [80863.478571] PGD 241b067 PUD 33effc067 PMD 33eeae067 PTE 8000000229b1a060 [80863.478571] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [80863.483994] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) crc_t10dif crct10dif_generic crct10dif_common pcspkr i2c_piix4 virtio_balloon virtio_console ip_tables rpcsec_gss_krb5 ata_generic pata_acpi drm_kms_helper ttm drm drm_panel_orientation_quirks ata_piix serio_raw virtio_blk libata i2c_core floppy [last unloaded: libcfs] [80863.503099] CPU: 8 PID: 24979 Comm: ll_ost_io04_005 Kdump: loaded Tainted: P OE ------------ 3.10.0-7.6-debug #1 [80863.503099] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [80863.503099] task: ffff880303c42900 ti: ffff880089ff4000 task.ti: ffff880089ff4000 [80863.503099] RIP: 0010:[<ffffffff813ee500>] [<ffffffff813ee500>] do_csum+0x70/0x180 [80863.503099] RSP: 0018:ffff880089ff7938 EFLAGS: 00010246 [80863.503099] RAX: 0000000000000000 RBX: ffff880229b1a000 RCX: 0000000000000040 [80863.503099] RDX: ffff880229b1a000 RSI: 0000000000001000 RDI: ffff880229b1a000 [80863.503099] RBP: ffff880089ff7938 R08: 0000000000000000 R09: 0000000000000000 [80863.503099] R10: 0000000000000040 R11: 0000000000000200 R12: 0000000000001000 [80863.503099] R13: 0000000000001000 R14: 0000000000000001 R15: 0000000000001000 [80863.587717] FS: 0000000000000000(0000) GS:ffff88033dc00000(0000) knlGS:0000000000000000 [80863.587717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [80863.587717] CR2: ffff880229b1a000 CR3: 00000000a792e000 CR4: 00000000000006e0 [80863.587717] Call Trace: [80863.599291] [<ffffffff813ee61e>] ip_compute_csum+0xe/0x30 [80863.642807] [<ffffffffa0572e1e>] obd_dif_ip_fn+0xe/0x10 [obdclass] [80863.642807] [<ffffffffa0572ee9>] obd_page_dif_generate_buffer+0xc9/0x190 [obdclass] [80863.642807] [<ffffffffa07a5e3f>] tgt_checksum_niobuf_rw+0x27f/0xea0 [ptlrpc] [80863.642807] [<ffffffffa0572e10>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [80863.642807] [<ffffffffa0572e10>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [80863.642807] [<ffffffffa07a842d>] tgt_brw_read+0xc2d/0x1e60 [ptlrpc] [80863.642807] [<ffffffff812127f4>] ? __kmalloc+0x634/0x660 [80863.736021] [<ffffffff813eca64>] ? vsnprintf+0x234/0x6a0 [80863.736021] [<ffffffffa0540119>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [80863.742878] [<ffffffffa0744fe6>] ? lustre_pack_reply_v2+0x166/0x290 [ptlrpc] [80863.742878] [<ffffffffa074517f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [80863.765470] [<ffffffffa0745301>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [80863.765470] [<ffffffffa07ac365>] tgt_request_handle+0xaf5/0x1590 [ptlrpc] [80863.765470] [<ffffffffa01f9fa7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [80863.765470] [<ffffffffa0750436>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc] [80863.775960] [<ffffffff810bfbd8>] ? __wake_up_common+0x58/0x90 [80863.775960] [<ffffffff813fb7bb>] ? do_raw_spin_unlock+0x4b/0x90 [80863.775960] [<ffffffffa0754329>] ptlrpc_main+0xa99/0x1f60 [ptlrpc] [80863.775960] [<ffffffff810c32ed>] ? finish_task_switch+0x5d/0x1b0 [80863.775960] [<ffffffffa0753890>] ? ptlrpc_register_service+0xfb0/0xfb0 [ptlrpc] [80863.775960] [<ffffffff810b4ed4>] kthread+0xe4/0xf0 [80863.775960] [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140 [80863.775960] [<ffffffff817c4c77>] ret_from_fork_nospec_begin+0x21/0x21 [80863.775960] [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140 [80863.775960] Code: 00 00 00 40 f6 c7 04 0f 85 de 00 00 00 41 89 d3 c1 ea 04 41 d1 eb 85 d2 41 89 d2 74 48 89 d1 45 31 c0 48 89 fa 66 0f 1f 44 00 00 <48> 03 02 48 13 42 08 48 13 42 10 48 13 42 18 48 13 42 20 48 13 Though I also had evidently a similar crc dump in september (but the t10 patch reowrking this area landed in June I think): [37976.307908] IP: [<ffffffff813c0ba0>] do_csum+0x70/0x180 [37976.307908] PGD 23e3067 PUD 33edfb067 PMD 33ed49067 PTE 8000000256203060 [37976.307908] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [37976.307908] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi ttm drm_kms_helper drm ata_piix i2c_piix4 virtio_console libata serio_raw pcspkr virtio_balloon virtio_blk i2c_core floppy ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [37976.307908] CPU: 10 PID: 12741 Comm: ll_ost_io05_004 Kdump: loaded Tainted: P OE ------------ 3.10.0-7.5-debug #1 [37976.349349] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [37976.349349] task: ffff8800a77e6600 ti: ffff88026c85c000 task.ti: ffff88026c85c000 [37976.349349] RIP: 0010:[<ffffffff813c0ba0>] [<ffffffff813c0ba0>] do_csum+0x70/0x180 [37976.349349] RSP: 0018:ffff88026c85f940 EFLAGS: 00010246 [37976.349349] RAX: 0000000000000000 RBX: ffff880256203000 RCX: 0000000000000040 [37976.349349] RDX: ffff880256203000 RSI: 0000000000001000 RDI: ffff880256203000 [37976.349349] RBP: ffff88026c85f940 R08: 0000000000000000 R09: 0000000000000000 [37976.349349] R10: 0000000000000040 R11: 0000000000000200 R12: 0000000000001000 [37976.349349] R13: 0000000000001000 R14: 0000000000000001 R15: 0000000000001000 [37976.349349] FS: 0000000000000000(0000) GS:ffff88033dc80000(0000) knlGS:0000000000000000 [37976.349349] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [37976.349349] CR2: ffff880256203000 CR3: 00000000933de000 CR4: 00000000000006e0 [37976.349349] Call Trace: [37976.349349] [<ffffffff813c0cbe>] ip_compute_csum+0xe/0x30 [37976.349349] [<ffffffffa09216fe>] obd_dif_ip_fn+0xe/0x10 [obdclass] [37976.349349] [<ffffffffa09217c9>] obd_page_dif_generate_buffer+0xc9/0x190 [obdclass] [37976.349349] [<ffffffffa09216f0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [37976.349349] [<ffffffffa0ba4320>] tgt_checksum_niobuf_rw+0x310/0xd60 [ptlrpc] [37976.349349] [<ffffffffa09216f0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [37976.349349] [<ffffffffa0b6947d>] ? __req_capsule_get+0x15d/0x700 [ptlrpc] [37976.349349] [<ffffffffa0ba6b40>] tgt_brw_read+0xc40/0x1e80 [ptlrpc] [37976.349349] [<ffffffff811eae74>] ? __kmalloc+0x634/0x660 [37976.349349] [<ffffffff813bf104>] ? vsnprintf+0x234/0x6a0 [37976.349349] [<ffffffffa08ee9c9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [37976.349349] [<ffffffffa0b7b600>] ? null_alloc_rs+0x110/0x330 [ptlrpc] [37976.349349] [<ffffffffa0b44a66>] ? lustre_pack_reply_v2+0x166/0x290 [ptlrpc] [37976.349349] [<ffffffffa0b44bff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [37976.349349] [<ffffffffa0b44d81>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [37976.349349] [<ffffffffa0bab755>] tgt_request_handle+0xaf5/0x1590 [ptlrpc] [37976.349349] [<ffffffffa07b4017>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [37976.349349] [<ffffffffa0b4fea6>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc] [37976.349349] [<ffffffff810b9398>] ? __wake_up_common+0x58/0x90 [37976.349349] [<ffffffff813ccd2b>] ? do_raw_spin_unlock+0x4b/0x90 [37976.349349] [<ffffffffa0b53c9e>] ptlrpc_main+0xabe/0x1f80 [ptlrpc] [37976.349349] [<ffffffffa0b531e0>] ? ptlrpc_register_service+0xeb0/0xeb0 [ptlrpc] [37976.349349] [<ffffffff810ae864>] kthread+0xe4/0xf0 [37976.349349] [<ffffffff810ae780>] ? kthread_create_on_node+0x140/0x140 [37976.349349] [<ffffffff81783777>] ret_from_fork_nospec_begin+0x21/0x21 [37976.349349] [<ffffffff810ae780>] ? kthread_create_on_node+0x140/0x140 [37976.349349] Code: 00 00 00 40 f6 c7 04 0f 85 de 00 00 00 41 89 d3 c1 ea 04 41 d1 eb 85 d2 41 89 d2 74 48 89 d1 45 31 c0 48 89 fa 66 0f 1f 44 00 00 <48> 03 02 48 13 42 08 48 13 42 10 48 13 42 18 48 13 42 20 48 13 [37976.349349] RIP [<ffffffff813c0ba0>] do_csum+0x70/0x180 |
| Comment by James A Simmons [ 26/Nov/18 ] |
|
Most interesting. I remember Oleg running into this problem with patch https://review.whamcloud.com/#/c/33363. I couldn't see why this error was showing up so easily with that patch. This might be the cause. |
| Comment by Li Xi [ 27/Nov/18 ] |
|
I am checking whether there is any difference between tgt_checksum_niobuf_t10pi() and osc_checksum_bulk_t10pi(). The extent range is [326-4194303], which is not 4K aligned. I guess there might be some corner cases (e.g. page not zeroed) which might cause the problem. |
| Comment by Alex Zhuravlev [ 27/Nov/18 ] |
|
I saw a similar problem with non-aligned writes where the bulk puts data into a page with non-zero offset. this can't be reproduced with adler algo. probably t10 code doesn't take lnb_page_offset into account (like osd-zfs didn't). |
| Comment by Gerrit Updater [ 27/Nov/18 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/33727 |
| Comment by Li Xi [ 27/Nov/18 ] |
|
I found a problem of osc_checksum_bulk_t10pi(). Alex, would you please check whether it helps? |
| Comment by Alex Zhuravlev [ 27/Nov/18 ] |
|
please, try with test in https://review.whamcloud.com/#/c/33726/2/lustre/tests/sanity.sh |
| Comment by Andreas Dilger [ 27/Nov/18 ] |
|
It would be best to rebase 33727 on top of 33726, and then modify the new sanity test to run through all of the checksum types. |
| Comment by Alex Zhuravlev [ 28/Nov/18 ] |
|
lixi_wc, is page offset used in T10 algo? |
| Comment by Li Xi [ 28/Nov/18 ] |
|
Hi Alex, page offset is not used in T10 algorithm. |
| Comment by Alex Zhuravlev [ 28/Nov/18 ] |
|
lixi_wc then please try to run the test mentioned above. basically the client sends partial non-aligned page, but the server puts data with page offset=0. |
| Comment by Alex Zhuravlev [ 28/Nov/18 ] |
|
so that test from |
| Comment by Andreas Dilger [ 28/Nov/18 ] |
|
Alex, is that still failing with the 33727 patch applied on top of your 33726 patch? |
| Comment by Alex Zhuravlev [ 29/Nov/18 ] |
|
adilger, yes it still fails with 4K version, but passes with 512. |
| Comment by Li Xi [ 29/Nov/18 ] |
|
The sanity:810 test that https://review.whamcloud.com/#/c/33726/ adds is still 512 bytes aligned. I guess that is why T10PI512 passes the test, but T10PI4096 doesn't. If the size is not 512 bytes aliged, I guess T10PI512 will fail too. |
| Comment by Alex Zhuravlev [ 29/Nov/18 ] |
|
lixi_wc, can you please explain why it fails depending on page offset? if we put data (2K, iirc) at 2K offset, then the checksums match, but if we put data at 0 offset, then the checksums don't match. probably I missed something, of course.. |
| Comment by Li Xi [ 29/Nov/18 ] |
|
obd_page_dif_generate_buffer() doesn't handle the non-aligned buffer well. I am checking how T10PI handles the unaligned buffer. |
| Comment by Gerrit Updater [ 29/Nov/18 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/33752 |
| Comment by Li Xi [ 29/Nov/18 ] |
|
Please check whether the patch 33752 on top of 33727 works well. I am setuping an environment to test it. And also, setuping a environment with T10PI hardware to check whether everything works. |
| Comment by Li Xi [ 29/Nov/18 ] |
|
With patch 33752 on top of 33727, I can not reproduce the problem. I am not sure I missed anything or not, since I didn't test the original Lustre without these patches.
# df | grep mnt
/dev/sdb1 102275928 62196 96970852 1% /mnt/mgs
/dev/sdb2 120662808 75284 110101768 1% /mnt/mdt
/dev/sdb3 170773236 61960 162004256 1% /mnt/ost
10.0.0.37@tcp:/server17 170773236 61960 162004256 1% /mnt/lustre
# lfs getstripe file
file
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 2 0x2 0
# cat /proc/fs/lustre/osc/server17-OST0000-osc-ffff8805c03ff000/checksum_type
crc32 adler crc32c t10ip512 [t10ip4K] t10crc512 t10crc4K
# lctl set_param fail_loc=0x411
fail_loc=0x411
# dd if=/dev/urandom of=file bs=10240 count=2
2+0 records in
2+0 records out
20480 bytes (20 kB) copied, 0.0139395 s, 1.5 MB/s
# md5sum file
5f67fa4f3b81beaab254a36276f659ce file
# lctl set_param ldlm.namespaces.*osc*.lru_size=clear
ldlm.namespaces.server17-OST0000-osc-MDT0000.lru_size=clear
ldlm.namespaces.server17-OST0000-osc-ffff8805c03ff000.lru_size=clear
# md5sum file
5f67fa4f3b81beaab254a36276f659ce file
|
| Comment by Li Xi [ 29/Nov/18 ] |
|
@Alex, I can not reproduce the problem even with only patch 33727. Would you please let me know what are the exact steps to reproduce? |
| Comment by Alex Zhuravlev [ 29/Nov/18 ] |
|
lixi_wc, revert 83cb17031913ba2f33a5b67219a03c5605f48f27 ( |
| Comment by Li Xi [ 30/Nov/18 ] |
|
Hi Alex, I am confused. Why reverting |
| Comment by Li Xi [ 30/Nov/18 ] |
|
I think multiple problems mixed here: |
| Comment by Li Xi [ 30/Nov/18 ] |
|
I can't reproduce the problem using sanity:810 test even with no patch of this ticket. So, what is the exact process of reproducing the problem? |
| Comment by Hongchao Zhang [ 30/Nov/18 ] |
|
This issue can be reproduced with checksum type "t10ip512", "t10ip4K", "t10crc512" and "t10crc4K", and won't be the patch from |
| Comment by Li Xi [ 30/Nov/18 ] |
|
Hongchao and Alex, I don't understand why That said, I will try to reproduce the probelm without reverting |
| Comment by Alex Zhuravlev [ 30/Nov/18 ] |
|
lixi_wc how current approach (which requires matching offsets) works on setups where page size between client and server doesn't match? |
| Comment by Li Xi [ 30/Nov/18 ] |
|
Hi Alex, good question. I did assume that page sizes are the same on clients and servers. However, even page sizes are different, it is still the same. In tgt_brw_write(), the server asked to transfer the data into the page with lnb_page_offset. And thus, the server should use this offset of the page to access the data, including writing to disk and calculating the RPC checksum. |
| Comment by Alex Zhuravlev [ 30/Nov/18 ] |
|
so there should be no issue whether server put data (and write from) offset=1K or offset=0K, right? |
| Comment by Li Xi [ 30/Nov/18 ] |
|
Hi Alex, I guess this might be related to the characteristics of checksum algorithms? I am not an expert on this, but I guess not all checksum types generate the same value for the following two process: 1) Calculate checksum of 3K byte and them accumulate checksum of 1K byte Even the 3K byte data + 1K byte data = 4 K byte data, the checksum values of 1) and 2) might be different for some checksum types, right? I am not sure about whether T10PI checksum or crc32, crc32c supports this feature. But let's assume there is an algorithm which doesn't support this feature. Then, if we don't apply the patch of The problem |
| Comment by Li Xi [ 30/Nov/18 ] |
|
I tend to believe 4K T10PI checksum doesn't support this feature, since it should calculate the whole page for each sector. Thus, merging data belongs to two pages on server side will corrupt the checksum for sure... |
| Comment by Li Xi [ 30/Nov/18 ] |
|
Hmm, I think https://review.whamcloud.com/#/c/32899 is still needed, because of the reason that I said. And I think the key reason here is not the line of "lnb[i].lnb_page_offset = poff;"
plen = min_t(int, poff + sz_in_block,
PAGE_SIZE);
plen -= poff;
This fixes the problem of merging two pages into a single one. |
| Comment by Andreas Dilger [ 30/Nov/18 ] |
|
There is still a problem with T10-PI checksums for partial pages, and I don't think that https://review.whamcloud.com/33752 is fixing the problem properly. That patch makes the T10-PI checksums consistent between the client and the server, but runs the risk of corrupting the data in the rest of the page that is being checksummed. Also, the main problem is that a T10-PI checksum on a partial sector is never going to be correct when sent to the underlying storage. What needs to be done is to have two separate T10-PI checksums for the first and last sectors, if they are partially overwritten. One checksum on the partial sector is needed to compute the checksum-of-checksums and match with the RPC checksum. A second checksum needs to be computed on a whole sector after it is modified, in order to submit to disk. If whole sectors are being modified (e.g. in the 10240-byte writes and 512-byte sector size) then the partial-sector handling is not needed. In order to ensure that there is proper end-to-end integrity for the checksums, when first or last partial page has been read from disk, but before the overwrite is done, the partial checksum should be computed for the areas that are not going to be overwritten (start of first partial sector, end of last partial sector). Ideally, the full sector will be checksummed again afterward to verify it still matches the original full-sector checksum, to verify that the partial sector checksum was done on correct data. Then, after the BRW/RDMA has overwritten the partial page, a full-sector checksum needs to be computed from the modified data. Next, the checksum on the first partial-sector checksum should be verified to match the original value computed, and the end partial-sector checksum should be verified against the value sent from the client. That ensures the two (non-overlapping) parts of the sector match the original data, and that the new full-sector checksum was done on the valid data. The full-sector checksum should be used when submitting the bio. For 2.12 it may be enough to read the page, apply the write, compute the checksum on the full sector for bio submission, and then verify that the partial-sector checksum matches the checksum from the client. This ensures the data sent from the network is valid, with an overlap for the full-sector checksum. The risk of corrupting the partial sector in RAM between the read/verify from disk and the overwrite is very small, and if this is happening then it is likely we would see other corruption on the full-sector writes as well. However, doing the extra partial-sector checksum for the unmodified part of the sector (as described above) would catch the case if the partial-sector overwrite is done incorrectly. |
| Comment by Li Xi [ 01/Dec/18 ] |
|
Hi Andreas, I agree on the long term ideal solution and 2.12 fix. And agreed that the implementation on 2.12 might be enough for most of the cases. The ideal solution is too complex thus error prone. And I doubt the correctness and usefulness of the ideal solution unless we have complex codes for fault injection. Anyway, I will refresh the https://review.whamcloud.com/33752 patch. |
| Comment by Hongchao Zhang [ 03/Dec/18 ] |
|
the checksum mismatch issue also exists for LDiskFS backend. |
| Comment by Gerrit Updater [ 03/Dec/18 ] |
|
Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33768 |
| Comment by Gerrit Updater [ 04/Dec/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33727/ |
| Comment by Gerrit Updater [ 04/Dec/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33752/ |
| Comment by Peter Jones [ 04/Dec/18 ] |
|
Landed for 2.12 |