[LU-16138] remove need for T10-PI block integrity patches from server kernel Created: 06/Sep/22 Updated: 02/Oct/23 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Dongyang Li |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | medium | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Kernels since v4.3-rc4-30-g0f8087ecdeac separate the block integrity verify and generate functions into a separate struct blk_integrity_profile instead of copying the pointers into struct blk_integrity. This might be used to facilitate replacing the blk_integrity_profile pointer on the block device for the duration that osd-ldiskfs is mounting it, and potentially avoid the need to patch the kernel for T10-PI at all. As a starting point, we would need to call blk_integrity_register() using a template holding the integrity methods provided by osd-ldiskfs - osd_dif_type<1,3>generate<crc,ip>() and osd_dif_type<1,3>verify<crc,ip>(). It looks like this function is just copying the individual fields from the provided template into the disk queue's blk_integrity fields, so we should save the original blk_integrity to be restored when osdldiskfs unmounts the target, otherwise the kernel would crash trying to access the old function pointers if there is any block device IO after osd-ldiskfs is removed from the kernel. It seems prudent to call blk_integrity_unregister() before unloading the module, as that would force all block device IO using the osd-ldiskfs.ko methods to be finished, and then blk_integrity_register() the saved values so that T10-PI is still functional on the device. The osd_dif_generate() and osd_dif_verify() functions need to handle being called on block read/write calls that are unrelated to osd-ldiskfs BRW IOs, since that will now happen for every block in the filesystem whether or not it came from osd-ldiskfs. One way to separate OSD vs. non-OSD IO might be by putting a magic at the end of struct osd_bio_private that can be used to reliably detect if this is a BRW IO or if it is IO on some other ldiskfs block. Non-BRW blocks would return NULL from find_lnb() and not dereference bio_private if NULL, and it looks like the rest of the code will "just work". |
| Comments |
| Comment by Gerrit Updater [ 20/Sep/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48608 |
| Comment by Gerrit Updater [ 20/Sep/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48609 |
| Comment by Gerrit Updater [ 10/Oct/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48608/ |
| Comment by Peter Jones [ 10/Oct/22 ] |
|
Landed for 2.16 |
| Comment by Jian Yu [ 10/Oct/22 ] |
|
Have to reopen this issue because removing the block integrity patches has not been done. |
| Comment by Gerrit Updater [ 26/Oct/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48609/ |
| Comment by Dongyang Li [ 20/Dec/22 ] |
|
Andreas, I think patching the kernel is inevitable. We could register our own integrity_profile, or we could save the generate/verify functions and replace them with our versions, then restore them when umount osd. But without at least applying block-pass-bio-into-integrity_processing_fn the struct blk_integrity_exchg/blk_integrity_iter doesn't know which bio are we working on, without the bio we could not access the struct osd_bio_private, and could not use find_lnb() to get the guard buffers. |
| Comment by Andreas Dilger [ 21/Dec/22 ] |
|
Some potential options for discussion:
|
| Comment by Dongyang Li [ 21/Dec/22 ] |
|
Looking up bio using the bi_iter won't work, we are looking at different iter. the generate/verify_fn is using struct blk_integrity_iter and bio has struct bvec_iter. The blk_integrity_iter is also declared on stack of bio_integrity_process(). One challenge of using iter.prot_buf as a hash key with unpatched kernel is the buf is allocated by bio_integrity_prep(), and bio_integrity_prep() will generate the guards for the write case immediately, |
| Comment by Andreas Dilger [ 21/Dec/22 ] |
|
OK, what about just having a per-device hash table using the device offsets? The generate_fn is at least called with "sector_t seed" and "disk_name" that could be mapped to the niobuf. It definitely isn't ideal, but could potentially work and we could see the overhead of that method vs. patching the kernel. I also see that there is a "prepare_fn" callback in the blk_integrity_profile on the device (e.g. t10_pi_type1_ip.t10_pi_type1_prepare) that has access to the bio, which might be helpful. |
| Comment by Dongyang Li [ 22/Dec/22 ] |
|
The per-device hash table could work, if the sector won't be remapped after we submit_bio(). |