<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:30:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16897] Optimize sparse file reads</title>
                <link>https://jira.whamcloud.com/browse/LU-16897</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Currently, when a client is reading a sparse file, it doesn&apos;t know whether there is data for any particular file offset, so it will pre-allocate pages and set up the RDMA for the full range of the reads. If the file is sparse (has a hole in the allocation) or has unwritten extents that return zeroes on read, the OST will zero full all of the requested pages, and transfer the zeroes over the network. &lt;/p&gt;

&lt;p&gt;It would be desirable to avoid sending the pages of zeroes over the network, to reduce network bandwidth and some CPU overhead on the OSS to zero out the pages.  IIRC, the BRW WRITE reply returns an array of rc values for each page to indicate success/failure for each one. I would expect BRW READ to return a special state for each page that indicates that it is a hole.  &lt;/p&gt;

&lt;p&gt;However, this is also complicated by the RDMA configuration, since it has already mapped the pages from the read buffer (which may be specific page cache pages).  The best solution would be for the LNet bulk transfer to &quot;not send&quot; those pages in the middle of the RDMA, and have LNet (or the RDMA engine) zero-fill the pages on the client without actually sending them over the wire, but i have no idea about how easy or hard that is to implement. &lt;/p&gt;

&lt;p&gt;Failing that, if the holes are packed in the server-side bulk transfer setup (and it is possible to send only the first pages in a shorter bulk transfer), then the client would need to &lt;tt&gt;memcpy()&lt;/tt&gt; the data into the correct pages (from last page to first) and zero the pages in the hole itself.  That would add CPU/memory overhead on the client, and would not work for RDMA offload like GDS. &lt;/p&gt;</description>
                <environment></environment>
        <key id="76571">LU-16897</key>
            <summary>Optimize sparse file reads</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="cbordage">Cyril Bordage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Wed, 14 Jun 2023 15:39:57 +0000</created>
                <updated>Tue, 12 Dec 2023 14:05:49 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="375411" author="adilger" created="Wed, 14 Jun 2023 16:01:21 +0000"  >&lt;p&gt;Serguei, Chris, I&apos;m wondering if there is any mechanism like I describe above to avoid bulk transfer of zeroes over the wire in LNet/OFED/IB/OFI?  It seems like something that might exist (eg. like &lt;tt&gt;WRITE_SAME&lt;/tt&gt; in the SCSI layer that can be used to zero-fill disk blocks without writing all of the zeroes over the bus). It might be useful to see how NVMeoF is implementing &lt;tt&gt;ioctl(BLKZEROOUT)&lt;/tt&gt; over the wire to see if there is protocol support for this. &lt;/p&gt;

&lt;p&gt;Implementing this in the LNet bulk transfer layer (with suitable input from Lustre about which pages are zeroes) would be far less complex than doing it at the Lustre protocol level. That would work with GPU Direct to zero fill the pages in the GPU without having to shift the data around on the client.&lt;/p&gt;

&lt;p&gt;The other solution that comes to mind would be for the client to &quot;pre-map&quot; the list of valid pages before setting up the bulk transfer and do a sparse read, but that would likely add a lot of overhead (one extra RPC round trip for every read, plus waiting in the RPC queue under congestion) and would be racy unless the whole RPC range is locked at the time.&lt;/p&gt;</comment>
                            <comment id="375857" author="adilger" created="Mon, 19 Jun 2023 16:27:58 +0000"  >&lt;p&gt;Cyril said he would look into what is possible in this area.&lt;/p&gt;</comment>
                            <comment id="375859" author="paf0186" created="Mon, 19 Jun 2023 17:12:51 +0000"  >&lt;p&gt;So a note from the person who thinks about the page cache all the time (It seems Andreas knows this already based on his comments, but I wanted to write this out explicitly.):&lt;/p&gt;

&lt;p&gt;It&apos;s a requirement that we end up with real, zero filled pages in the page cache.&#160; The Linux page cache doesn&apos;t know about holes - it expects the layers below it to give it zero-filled pages when it tries to read a hole in a file and it will just keep those zero filled pages around.&#160; This isn&apos;t something we can change.&lt;/p&gt;

&lt;p&gt;So if we can zero-fill the pages efficiently somehow in the LNet/RDMA layers without sending zeroes over the wire, great, otherwise we&apos;ll have to capture the info in the Lustre protocol and have the client (I think at the BRW level) zero-fill the pages locally.&#160; Either way we have to zero-fill those pages.&#160; (This would also be complicated because there&apos;s no obvious way to express hole boundaries in the protocol, since the most important holes for us are probably the holes created by compression, and those holes are only part of an niobuf.)&lt;/p&gt;</comment>
                            <comment id="376343" author="cbordage" created="Fri, 23 Jun 2023 12:44:59 +0000"  >&lt;p&gt;Just a littel update.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=adilger&quot; class=&quot;user-hover&quot; rel=&quot;adilger&quot;&gt;adilger&lt;/a&gt;, FWIK, there is no such mechanism to avoid bulk transfer of zeroes over the wire in LNet/OFED/IB/OFI.&lt;/p&gt;

&lt;p&gt;I studied BLKZEROOUT in NVMe-oF and haven&apos;t found anything interesting. Some details are missing in my analysis. I will keep working on that.&lt;/p&gt;

&lt;p&gt;In the same time, I am studying how Lustre uses LNet to be sure what we would have to modify in LNet and how. Do you have good pointers for that.&lt;/p&gt;</comment>
                            <comment id="376404" author="adilger" created="Sat, 24 Jun 2023 03:06:05 +0000"  >&lt;p&gt;The interesting interaction on the client in this case is &lt;tt&gt;prlrpc_register_bulk()&lt;/tt&gt; where the pages are registered with LNet for RDMA transfer. There is a corresponding function on the server, whose name escapes me at this point.  I&apos;ll take a look at the code to find it. &lt;/p&gt;</comment>
                            <comment id="380147" author="cbordage" created="Wed, 26 Jul 2023 08:39:42 +0000"  >&lt;p&gt;To have less modification on Lustre, I did not choose to modify prlrpc_register_bulk, but instead to make change closer to LNet levels.&lt;/p&gt;

&lt;p&gt;I wrote a proof of concept where I modified lnet_md_build to take into account sparse pages when building lnet_libmd. With this change at this location, I guess less modifications will be needed, and it will be more transparent.&lt;/p&gt;

&lt;p&gt;For now, I am trying to fix some issues to take into account the size. My next step will be to get retrieve sparseness information.&lt;/p&gt;</comment>
                            <comment id="387116" author="JIRAUSER17312" created="Mon, 25 Sep 2023 16:02:19 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=cbordage&quot; class=&quot;user-hover&quot; rel=&quot;cbordage&quot;&gt;cbordage&lt;/a&gt;&#160;&lt;/p&gt;

&lt;p&gt;How is this progressing? Do you have a patch ready to submit?&lt;/p&gt;</comment>
                            <comment id="391575" author="gerrit" created="Fri, 3 Nov 2023 02:52:23 +0000"  >&lt;p&gt;&quot;Cyril Bordage &amp;lt;cbordage@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/52970&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/52970&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16897&quot; title=&quot;Optimize sparse file reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16897&quot;&gt;LU-16897&lt;/a&gt; lnet: optimize sparse file reads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7274d263e90892078d58d458347ee9b29fbdc291&lt;/p&gt;</comment>
                            <comment id="391576" author="cbordage" created="Fri, 3 Nov 2023 02:55:14 +0000"  >&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;I wanted to give some updates before my two-week leave.&lt;br/&gt;
Unfortunately, the code is not in a stable state. Indeed, yesterday, I got another idea on how to transmit and apply sparseness information, and started the modifications.&lt;br/&gt;
I am not sure the code will help, but I still push it to have some base.&lt;/p&gt;

&lt;p&gt;I do not know yet how to retrieve sparseness information. In my tests, I simulated that by detecting pages filled with 0s (in ptlrpc_fill_bulk_md). Of course, it won&apos;t be used in the final code.&lt;/p&gt;

&lt;p&gt;Here is my current strategy for TCP:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;If data to send contains sparse pages, the server will send a LNET_MSG_PUT_SPARSE message type&lt;/li&gt;
	&lt;li&gt;To exchange sparseness information, there is a sparseness header that is added when the message type is LNET_MSG_PUT_SPARSE. This header will contain the number of empty pages, and the list (maximum 256 if PAGE_SIZE is 4kB).&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;On the OSS side&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;lnet_extract_kiov is modified to skip sparse pages&lt;/li&gt;
	&lt;li&gt;In ksocknal_transmit, we also send sparseness header&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;On client side&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Receiving an LNET_MSG_PUT_SPARSE message will lead to read sparseness header&lt;/li&gt;
	&lt;li&gt;lnet_extract_kiov will used this header to fill sparse pages to 0 and skip them when building dst bio_vec&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Previously, in the code I pushed (even if some parts do not work anymore), I was using a &quot;sparse&quot; page in the kiov to store sparseness information, it won&apos;t be the case in the new implementation.&lt;br/&gt;
I think the new implementation will be cleaner and it is easier to add the RDMA part.&lt;/p&gt;</comment>
                            <comment id="391579" author="paf0186" created="Fri, 3 Nov 2023 03:19:20 +0000"  >&lt;p&gt;Cyril,&lt;/p&gt;

&lt;p&gt;This sounds good to me, and a header seems better than a page in the kiov.&lt;/p&gt;

&lt;p&gt;What is the issue about the source of the sparseness information exactly?&#160; Like, when and where in the code do you need it?&#160; (Maybe something someone could help with that, which is why I ask.)&lt;/p&gt;</comment>
                            <comment id="391623" author="adilger" created="Fri, 3 Nov 2023 10:57:10 +0000"  >&lt;p&gt;Are there 256 bits in the header to indicate sparseness?  That is 32 bytes per MB, which is not too bad. The size of this bitmap should be flexible/stored in the header itself, in case the RDMA size changes?  Mind you, even for large PAGE_SIZE the &lt;b&gt;number&lt;/b&gt; of pages will be the same...&lt;/p&gt;

&lt;p&gt;As for sparseness information, that is available in osd-ldiskfs and possibly OFD via FIEMAP.  This could be pushed down to LNet as part of the bulk descriptor setup, rather than having LNet &quot;detect&quot; this itself. &lt;/p&gt;</comment>
                            <comment id="394881" author="paf0186" created="Thu, 30 Nov 2023 05:33:43 +0000"  >&lt;p&gt;Cyril,&lt;/p&gt;

&lt;p&gt;Any update here?&#160; Just curious.&#160; While working on something related, I learned how to get the needed sparseness info and I think how we could present it to lnet (we could know if a page is sparse and communicate that when &quot;add_kiov_frag&quot; is called on the server to populate the RDMA).&lt;/p&gt;

&lt;p&gt;You mentioned TCP, is there any part of this that wouldn&apos;t work on o2ib/infiniband?&#160; That&apos;s our most important target.&lt;/p&gt;

&lt;p&gt;In fact, I could even give you a code snippet to provide that information when add_kiov_frag is called if that would be helpful - it&apos;s quite easy to detect when you&apos;re reading from a hole, at least for ldiskfs.&#160; (ZFS is different, haven&apos;t sorted that out)&lt;/p&gt;</comment>
                            <comment id="394942" author="cbordage" created="Thu, 30 Nov 2023 13:21:07 +0000"  >&lt;p&gt;Hello Patrick,&lt;/p&gt;

&lt;p&gt;I was on holidays and had only the time to work a little on how to get sparseness information. So your message is at the right moment.&lt;br/&gt;
I studied &lt;em&gt;osd_ldiskfs_map_inode_pages&lt;/em&gt; where I found needed information, but index of sparse pages was good only on the first access. I supposed it was due to some cache mechanism. So, I will look at add_kiov_frag.  If it is not too long for you, a little code snippet could be helpful.&lt;/p&gt;

&lt;p&gt;For RDMA, the strategy is not the same. On the server, the part that changes, is kiblnd_init_rdma. We skip some destination pages to match server layout. On its side, the client will receive the list of the holes to zero fill the skipped pages.&lt;/p&gt;

&lt;p&gt;Thank you.&lt;/p&gt;</comment>
                            <comment id="394976" author="paf0186" created="Thu, 30 Nov 2023 15:54:42 +0000"  >&lt;p&gt;OK, I can send over a small patch - it can mark the pages as &apos;holes&apos; when they&apos;re read from disk (on ldiskfs, not sure about ZFS, but that can come later) and then &apos;un-hole&apos; them when we do a write to them.&#160; I think that should suffice.&lt;/p&gt;

&lt;p&gt;It would be good if possible if we could have the same approach for all LNDs - I&apos;m not sure if that&apos;s possible, though, it sounds like it&apos;s not.&#160; I guess the issue is the client makes a read request, and the server does not have the sparseness information until it is setting up the transfer.&#160; (BTW, if it makes things easier, I suggest maybe ignoring sparse for &apos;short_io&apos; transfers, at least in an initial version.&#160; It is a less important case.)&lt;/p&gt;

&lt;p&gt;I&apos;ll post that small patch a little later.&lt;/p&gt;</comment>
                            <comment id="394995" author="gerrit" created="Thu, 30 Nov 2023 17:01:35 +0000"  >&lt;p&gt;&quot;Patrick Farrell &amp;lt;pfarrell@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/53297&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/53297&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16897&quot; title=&quot;Optimize sparse file reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16897&quot;&gt;LU-16897&lt;/a&gt; tgt: note &apos;hole&apos; pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 49393dd3bba82b5e405cd42fe3c048de44357d95&lt;/p&gt;</comment>
                            <comment id="394996" author="paf0186" created="Thu, 30 Nov 2023 17:03:58 +0000"  >&lt;p&gt;Cyril,&lt;/p&gt;

&lt;p&gt;I just pushed that code here.&#160; The commit message includes some additional information that will hopefully be useful.&lt;/p&gt;

&lt;p&gt;Here&apos;s a quick example of this working.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#This example uses a slightly larger file so we don&apos;t do short_io.
# 4K at start of file
dd if=/dev/zero bs=4K count=1 of=./newfile conv=notrunc
sync

# 4K at offset of 128K - data from 0-4K, hole from 4K to 128K, data from 128K to 132K
dd if=/dev/zero bs=4K seek=32 count=1 of=./newfile conv=notrunc
sync

# clear cache to force read
echo 3 &amp;gt; /proc/sys/vm/drop_caches
lctl set_param *debug=-1 debug_mb=10000
lctl clear
# read all of the file at once
dd if=./newfile bs=1M count=1 of=/tmp/newfile_copy
lctl dk &amp;gt; /tmp/out
lctl set_param *debug=0 debug_mb=0 

grep &quot;hole &quot; /tmp/out&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here&apos;s what the output of that grep looks like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00080000:00000002:2.0:1701363786.690057:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 1, block_idx 1, at offset 4096
00080000:00000002:2.0:1701363786.690058:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 2, block_idx 2, at offset 8192
00080000:00000002:2.0:1701363786.690059:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 3, block_idx 3, at offset 12288
00080000:00000002:2.0:1701363786.690059:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 4, block_idx 4, at offset 16384
00080000:00000002:2.0:1701363786.690061:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 5, block_idx 5, at offset 20480
00080000:00000002:2.0:1701363786.690062:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 6, block_idx 6, at offset 24576
00080000:00000002:2.0:1701363786.690063:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 7, block_idx 7, at offset 28672
00080000:00000002:2.0:1701363786.690064:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 8, block_idx 8, at offset 32768
00080000:00000002:2.0:1701363786.690064:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 9, block_idx 9, at offset 36864
00080000:00000002:2.0:1701363786.690065:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 10, block_idx 10, at offset 40960
00080000:00000002:2.0:1701363786.690066:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 11, block_idx 11, at offset 45056
00080000:00000002:2.0:1701363786.690066:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 12, block_idx 12, at offset 49152
00080000:00000002:2.0:1701363786.690067:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 13, block_idx 13, at offset 53248
00080000:00000002:2.0:1701363786.690068:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 14, block_idx 14, at offset 57344
00080000:00000002:2.0:1701363786.690069:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 15, block_idx 15, at offset 61440
00080000:00000002:2.0:1701363786.690070:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 16, block_idx 16, at offset 65536
00080000:00000002:2.0:1701363786.690071:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 17, block_idx 17, at offset 69632
00080000:00000002:2.0:1701363786.690072:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 18, block_idx 18, at offset 73728
00080000:00000002:2.0:1701363786.690072:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 19, block_idx 19, at offset 77824
00080000:00000002:2.0:1701363786.690073:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 20, block_idx 20, at offset 81920
00080000:00000002:2.0:1701363786.690073:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 21, block_idx 21, at offset 86016
00080000:00000002:2.0:1701363786.690074:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 22, block_idx 22, at offset 90112
00080000:00000002:2.0:1701363786.690075:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 23, block_idx 23, at offset 94208
00080000:00000002:2.0:1701363786.690075:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 24, block_idx 24, at offset 98304
00080000:00000002:2.0:1701363786.690077:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 25, block_idx 25, at offset 102400
00080000:00000002:2.0:1701363786.690078:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 26, block_idx 26, at offset 106496
00080000:00000002:2.0:1701363786.690079:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 27, block_idx 27, at offset 110592
00080000:00000002:2.0:1701363786.690079:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 28, block_idx 28, at offset 114688
00080000:00000002:2.0:1701363786.690080:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 29, block_idx 29, at offset 118784
00080000:00000002:2.0:1701363786.690081:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 30, block_idx 30, at offset 122880
00080000:00000002:2.0:1701363786.690082:0:5679:0:(osd_io.c:450:osd_do_bio()) hole at page_idx 31, block_idx 31, at offset 126976
00000020:00000002:2.0:1701363786.690292:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 0, at offset 0, hole 0
00000020:00000002:2.0:1701363786.690292:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 1, at offset 4096, hole 1
00000020:00000002:2.0:1701363786.690293:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 2, at offset 8192, hole 1
00000020:00000002:2.0:1701363786.690293:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 3, at offset 12288, hole 1
00000020:00000002:2.0:1701363786.690293:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 4, at offset 16384, hole 1
00000020:00000002:2.0:1701363786.690294:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 5, at offset 20480, hole 1
00000020:00000002:2.0:1701363786.690294:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 6, at offset 24576, hole 1
00000020:00000002:2.0:1701363786.690294:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 7, at offset 28672, hole 1
00000020:00000002:2.0:1701363786.690295:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 8, at offset 32768, hole 1
00000020:00000002:2.0:1701363786.690295:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 9, at offset 36864, hole 1
00000020:00000002:2.0:1701363786.690295:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 10, at offset 40960, hole 1
00000020:00000002:2.0:1701363786.690296:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 11, at offset 45056, hole 1
00000020:00000002:2.0:1701363786.690296:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 12, at offset 49152, hole 1
00000020:00000002:2.0:1701363786.690296:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 13, at offset 53248, hole 1
00000020:00000002:2.0:1701363786.690296:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 14, at offset 57344, hole 1
00000020:00000002:2.0:1701363786.690297:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 15, at offset 61440, hole 1
00000020:00000002:2.0:1701363786.690297:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 16, at offset 65536, hole 1
00000020:00000002:2.0:1701363786.690297:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 17, at offset 69632, hole 1
00000020:00000002:2.0:1701363786.690298:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 18, at offset 73728, hole 1
00000020:00000002:2.0:1701363786.690298:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 19, at offset 77824, hole 1
00000020:00000002:2.0:1701363786.690298:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 20, at offset 81920, hole 1
00000020:00000002:2.0:1701363786.690299:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 21, at offset 86016, hole 1
00000020:00000002:2.0:1701363786.690299:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 22, at offset 90112, hole 1
00000020:00000002:2.0:1701363786.690299:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 23, at offset 94208, hole 1
00000020:00000002:2.0:1701363786.690300:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 24, at offset 98304, hole 1
00000020:00000002:2.0:1701363786.690300:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 25, at offset 102400, hole 1
00000020:00000002:2.0:1701363786.690301:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 26, at offset 106496, hole 1
00000020:00000002:2.0:1701363786.690301:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 27, at offset 110592, hole 1
00000020:00000002:2.0:1701363786.690301:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 28, at offset 114688, hole 1
00000020:00000002:2.0:1701363786.690302:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 29, at offset 118784, hole 1
00000020:00000002:2.0:1701363786.690302:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 30, at offset 122880, hole 1
00000020:00000002:2.0:1701363786.690302:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 31, at offset 126976, hole 1
00000020:00000002:2.0:1701363786.690302:0:5679:0:(tgt_handler.c:2438:tgt_brw_read()) lnb 32, at offset 131072, hole 0 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="395344" author="cbordage" created="Mon, 4 Dec 2023 14:29:23 +0000"  >&lt;p&gt;Hello Patrick,&lt;/p&gt;

&lt;p&gt;thank you for the code snippet. I did test to read sparseness information in osd_do_bio as you did, but, as I said in a previous comment, I had a problem for a second read (because of some cache mechanism). But, I hadn&apos;t tried to check that from tgt_brw_read, as you did in your code. I was glad to see the value in tgt_brw_read was right, and so I have no cache problem anymore.&lt;br/&gt;
So, what cache problem are you referring to? I am sorry I did not get what you explained.&lt;/p&gt;</comment>
                            <comment id="395425" author="adilger" created="Mon, 4 Dec 2023 23:31:11 +0000"  >&lt;p&gt;Patrick, it looks like your patch will note holes in the file on the initial read from the backing OSD inode, but there is no way (yet) to save this across RPCs.  So if two clients read the same file only the first one would exclude the holes from the bulk RPC transfer.&lt;/p&gt;

&lt;p&gt; It would be very useful in this case to mark the pages in the cache with a &quot;hole&quot; flag so that they can be skipped for all clients that understand this sparse RPC mechanism. &lt;/p&gt;</comment>
                            <comment id="395426" author="paf0186" created="Mon, 4 Dec 2023 23:48:56 +0000"  >&lt;p&gt;Cyril,&lt;/p&gt;

&lt;p&gt;Andreas actually just explained the cache issue I was referring to.&#160; I&apos;ll give my own version just to be extra clear.&lt;/p&gt;

&lt;p&gt;But before I explain the cache thing, I want to note - this shouldn&apos;t affect you except at the very edge.&#160; The key point for lnet/lnd is that the &apos;hole&apos; information will be available when you call add_kiov_frag.&#160; Exactly how we get that information is a separate problem and the current patch good enough to do the LNet dev against.&#160; (Especially because it will be easy to swap in a better way to handle the hole info, since we&apos;ll present it to you at the same place.)&lt;/p&gt;

&lt;p&gt;OK, the issue is:&lt;/p&gt;

&lt;p&gt;I&apos;m marking the &apos;hole&apos; information in the lnb.&#160; lnbs are associated with pages, but lnbs only exist for one IO.&#160; When we create an lnb, we get a page for it, and that page might already be ready in cache, so we don&apos;t read it from disk.&#160; But we&apos;re only marking the lnb as being a hole when we read from disk, so this information is lost when the lnb is destroyed after that specific read is complete.&#160; So if we&apos;re using the cache, the next read of that page will not do a read from disk, so it misses that the page is a hole.&lt;/p&gt;

&lt;p&gt;The tricky thing here, Andreas, is we don&apos;t use page-&amp;gt;private on the server and I don&apos;t think we could safely use it with ZFS pages (and using it for ldiskfs would be possible but take some work).&#160; So there&apos;s nowhere for us to store Lustre specific info attached to the page.&#160; We &lt;b&gt;might&lt;/b&gt; be able to steal some Linux page flag...&#160; We could probably use PagePrivate2, like we did for encryption on the client?&#160; That seems likely to be safe.&lt;/p&gt;</comment>
                            <comment id="395478" author="cbordage" created="Tue, 5 Dec 2023 10:52:50 +0000"  >&lt;p&gt;I was not sure about the cache issue because of the tests I run. I tested exactly what you described (two clients read the same files) but got no issue with that. The holes were detected for the two of them. Do I have to enable some option to have that to fail?&lt;/p&gt;</comment>
                            <comment id="395522" author="paf0186" created="Tue, 5 Dec 2023 16:03:09 +0000"  >&lt;p&gt;Cyril,&lt;/p&gt;

&lt;p&gt;This is likely related to whether or not your test system is running on nonrotational media.&#160; This controls whether or not the page cache is on by default.&lt;/p&gt;

&lt;p&gt;But I suggest ignoring this for now while you work on the sparse reads support - presenting the hole information correctly is something I think I or Artem or someone else from the file system side can take care of relatively easily.&lt;/p&gt;</comment>
                            <comment id="395930" author="cbordage" created="Thu, 7 Dec 2023 22:38:55 +0000"  >&lt;p&gt;You can find the wiki page &lt;a href=&quot;https://wiki.whamcloud.com/pages/viewpage.action?pageId=289277787&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="395932" author="paf0186" created="Thu, 7 Dec 2023 23:13:00 +0000"  >&lt;p&gt;Cyril,&lt;/p&gt;

&lt;p&gt;Can you adjust permissions on that?&#160; I can&apos;t access it even when logged in.&#160; You might just want to move it to a different part of the Wiki - we&apos;ve got a (rarely used &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;) public design doc page here: &lt;a href=&quot;https://wiki.whamcloud.com/display/PUB/Lustre+Design+documents&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.whamcloud.com/display/PUB/Lustre+Design+documents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;--------------&lt;/p&gt;

&lt;p&gt;(here&apos;s the error I got - note the permissions from parent part)&lt;br/&gt;
&quot;You don&apos;t have permission to view this page&lt;/p&gt;

&lt;p&gt;This is because it&apos;s inheriting restrictions from a parent page. A space admin or the person who shared this page may be able to give you access.&quot;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03o27:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>