<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:22:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15938] MDT recovery did not finish due to corrupt llog record</title>
                <link>https://jira.whamcloud.com/browse/LU-15938</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A broken DNE recovery llog record was preventing MDT-MDT recovery from completing.  MDT0003 was permanently unable to finish recovery with MDT0019, looping on:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;llog_process_thread()) lfs02-MDT0019-osp-MDT0003 retry remote llog process
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There was a bad record in the llog file, and the recovery would process the llog (all but one other record had successfully been cancelled) and then hit a bad record and abort, then retry.&lt;/p&gt;

&lt;p&gt;Since the DNE recovery llog for MDT0003 is stored on MDT0019, this necessitated &quot;fixing&quot; the llog file on MDT0019 by truncating it to zero bytes and which allowed MDT0003 recovery to finish.&lt;/p&gt;

&lt;p&gt;Retrying recovery can be useful in some cases, if the remote MDT is inaccessible, but if there is a single bad record it makes sense to only retry once (in case the llog was in the middle of being written) and then cancel this record and continue with the rest of recovery, or at worst abort recovery with that MDT and cancel the whole llog file.  Otherwise, this needs manual intervention to recover from this situation, which can&apos;t do better than cancelling the llog record (pending &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15937&quot; title=&quot;lctl llog commands do not work for DNE recovery logs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15937&quot;&gt;LU-15937&lt;/a&gt;) or delete the whole llog file.&lt;/p&gt;</description>
                <environment></environment>
        <key id="70717">LU-15938</key>
            <summary>MDT recovery did not finish due to corrupt llog record</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Sun, 12 Jun 2022 03:24:19 +0000</created>
                <updated>Mon, 5 Jun 2023 16:11:02 +0000</updated>
                            <resolved>Thu, 8 Dec 2022 00:14:59 +0000</resolved>
                                    <version>Lustre 2.14.0</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="337567" author="adilger" created="Sun, 12 Jun 2022 03:26:24 +0000"  >&lt;p&gt;This is similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15761&quot; title=&quot;cannot finish MDS recovery&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15761&quot;&gt;&lt;del&gt;LU-15761&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15645&quot; title=&quot;gap in recovery llog should not be a fatal error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15645&quot;&gt;&lt;del&gt;LU-15645&lt;/del&gt;&lt;/a&gt;, but apparently the llog error is different enough that those patches did not allow the recovery to complete automatically.&lt;/p&gt;</comment>
                            <comment id="337569" author="bzzz" created="Sun, 12 Jun 2022 04:47:28 +0000"  >&lt;p&gt;where can I find that corrupted llog for analysis?&lt;/p&gt;</comment>
                            <comment id="337576" author="dvensko" created="Sun, 12 Jun 2022 16:03:53 +0000"  >&lt;p&gt;attached the requested log&lt;/p&gt;</comment>
                            <comment id="337637" author="pjones" created="Mon, 13 Jun 2022 17:54:53 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please investigate&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="337753" author="adilger" created="Tue, 14 Jun 2022 19:37:52 +0000"  >&lt;p&gt;The problematic llog file decoded by &lt;tt&gt;llog_reader&lt;/tt&gt; from master  shows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rec #7707 type=106a0000 len=1160 offset 8670472
Header size : 32768      llh_size : 496
Time : Thu Apr  7 10:52:40 2022
Number of records: 2    cat_idx: 9      last_idx: 8244
Target uuid : 
-----------------------
#7707 (1160)updatelog record master_transno:45507493823 batchid:37129103921 flags:0x0 u_index:0 u_count:11 p_count:18
        [0x1840003b3b:0x2fce:0x0] type:create/1 params:2 p_0:0 p_1:1 
        [0x1840003b3b:0x2fce:0x0] type:ref_add/3 params:0 
        [0x1840003b3b:0x2fce:0x0] type:insert/10 params:3 p_0:2 p_1:3 p_2:4 
        [0x1840003b3b:0x2fce:0x0] type:insert/10 params:3 p_0:5 p_1:1 p_2:4 
        [0x1840003b3b:0x2fce:0x0] type:xattr_set/7 params:3 p_0:6 p_1:7 p_2:8 
        [0x1280002b0e:0x1be36:0x0] type:insert/10 params:3 p_0:9 p_1:3 p_2:10 
        [0x1280002b0e:0x1be36:0x0] type:ref_add/3 params:0 
        [0x1840003b3b:0x2fce:0x0] type:xattr_set/7 params:3 p_0:11 p_1:12 p_2:8 
        [0x1280002b0e:0x1be36:0x0] type:attr_set/5 params:1 p_0:13 
        [0x1840003b3b:0x2fce:0x0] type:xattr_set/7 params:3 p_0:14 p_1:15 p_2:8 
        [0x200000001:0x15:0x0] type:write/12 params:2 p_0:16 p_1:17 
        p_0 - 208/\x8E070000000000000000000000000000000000000000000000000000000000000000000000000000005ob\x00000000005ob\x00000000005ob\x0000
        p_1 - 16/\x0E+\x0080120000006\xBE0100000000
        p_2 - 2/.
        p_3 - 0/
        p_4 - 0/
        p_5 - 16384/\x0000000000000400jb\x0000000000@\x0000000000000300jb\x00000000..\x0000000000000C00000000000000trusted.dmv\x00d\x0000000\x00
        p_6 - 0/
        p_7 - 0/
        p_8 - 0/
        p_9 - 0/
        p_10 - 0/
        p_11 - 0/
        p_12 - 0/
        p_13 - 0/
        p_14 - 0/
        p_15 - 0/
        p_16 - 0/
        p_17 - 0/

#8244 (1160)NOT SET updatelog record master_transno:45508195003 batchid:37129117505 flags:0x0 u_index:0 u_count:11 p_count:18
        [0x18400034de:0x6863:0x0] type:create/1 params:2 p_0:0 p_1:1 
        [0x18400034de:0x6863:0x0] type:ref_add/3 params:0 
        [0x18400034de:0x6863:0x0] type:insert/10 params:3 p_0:2 p_1:3 p_2:4 
        [0x18400034de:0x6863:0x0] type:insert/10 params:3 p_0:5 p_1:1 p_2:4 
        [0x18400034de:0x6863:0x0] type:xattr_set/7 params:3 p_0:6 p_1:7 p_2:8 
        [0x128000369b:0x174b8:0x0] type:insert/10 params:3 p_0:9 p_1:3 p_2:10 
        [0x128000369b:0x174b8:0x0] type:ref_add/3 params:0 
        [0x18400034de:0x6863:0x0] type:xattr_set/7 params:3 p_0:11 p_1:12 p_2:8 
        [0x128000369b:0x174b8:0x0] type:attr_set/5 params:1 p_0:13 
        [0x18400034de:0x6863:0x0] type:xattr_set/7 params:3 p_0:14 p_1:15 p_2:8 
        [0x200000001:0x15:0x0] type:write/12 params:2 p_0:16 p_1:17 
        p_0 - 208/\x8E070000000000000000000000000000000000000000000000000000000000000000000000000000H5ob\x00000000H5ob\x00000000H5ob\x000000
        p_1 - 16/\x9B6\x008012000000B8t\x0100000000
        p_2 - 2/.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The recovery from the source MDT reported an error when processing record 8245.&lt;/p&gt;</comment>
                            <comment id="337757" author="tappro" created="Tue, 14 Jun 2022 20:33:45 +0000"  >&lt;p&gt;yes, I&apos;ve found the same. Log itself is not corrupted, all records are processed by llog_reader without errors, llog size is also fits data in llog. What is not OK so far - &quot;&lt;tt&gt;Number of records: 2&lt;/tt&gt;&quot;, first I thought that is OK because one record is llog_header but in fact llog_reader handles that and does decrement. So that is mismatch, llog reports 2 live records but has only 1. I am not sure if that is problem or not.&#160;&lt;/p&gt;

&lt;p&gt;I am checking code to see how &lt;tt&gt;-5&lt;/tt&gt; could be returned. My idea is that we might have read over llog size somehow&lt;/p&gt;

&lt;p&gt;Other observation is that used server version misses the following patches:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
LU-13974 llog: check stale osp object
LU-15645 obdclass: llog to handle gaps
LU-15761 obdclass: fix locking in llog_cat_refresh()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;It looks we don&apos;t have here direct result of any of them but still it would be better to update&lt;/p&gt;</comment>
                            <comment id="337758" author="tappro" created="Tue, 14 Jun 2022 20:35:49 +0000"  >&lt;p&gt;just addition to comment above, lgh_count mismatch may be caused by undo operations over llog, if pad record was involved, so that might indicate that there were undo operations over llog&lt;/p&gt;</comment>
                            <comment id="337773" author="adilger" created="Tue, 14 Jun 2022 22:10:36 +0000"  >&lt;p&gt;From the debug logs it shows the MDT is getting -EIO when trying to read the next log, maybe because the llog is short, or it &lt;em&gt;thinks&lt;/em&gt; it is short because of the number of records?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00000001:1.0:1654980511.161770:0:11204:0:(osp_md_object.c:1345:osp_md_read()) Process leaving (rc=4400 : 4400 : 1130)
00000040:00000001:1.0:1654980511.161772:0:11204:0:(llog_osd.c:1089:llog_osd_next_block()) Process leaving via out (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
00000040:00000001:1.0:1654980511.161774:0:11204:0:(lustre_log.h:465:llog_next_block()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00000040:00000001:1.0:1654980511.161775:0:11204:0:(llog.c:572:llog_process_thread()) Process leaving via out (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
00000040:00080000:1.0:1654980511.161777:0:11204:0:(llog.c:761:llog_process_thread()) stop processing plain 0x2:2147506325:0 index 8245 count 3
00000040:00020000:1.0:1654980511.161778:0:11204:0:(llog.c:774:llog_process_thread()) lfs02-MDT0019-osp-MDT0003 retry remote llog process
00000040:00000001:1.0:1654980511.161781:0:11204:0:(llog.c:897:llog_process_or_fork()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5)
00000040:00000001:1.0:1654980511.161782:0:11204:0:(llog_cat.c:904:llog_cat_process_cb()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5)
00000040:00000001:1.0:1654980511.161784:0:11204:0:(llog.c:742:llog_process_thread()) Process leaving via out (rc=18446744073709551605 : -11 : 0xfffffffffffffff5)
00000040:00080000:1.0:1654980511.161785:0:11204:0:(llog.c:761:llog_process_thread()) stop processing catalog 0x1:2147484688:0 index 9 count 12
00000040:00000001:1.0:1654980511.161790:0:11204:0:(llog.c:897:llog_process_or_fork()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5)
00000040:00000001:1.0:1654980511.161791:0:11204:0:(llog_cat.c:966:llog_cat_process_or_fork()) Process leaving (rc=18446744073709551605 : -11 : fffffffffffffff5)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="337810" author="tappro" created="Wed, 15 Jun 2022 11:45:23 +0000"  >&lt;p&gt;so far it looks like MDT0003 has record #8245 in local llog bitmap and is trying to read that records from MDT0019. Meanwhile MDT0019 doesn&apos;t have it neither in bitmap nor in file, file size ends exactly at #8244 record. That means that #8245 write request were sent to MDT0019 from the MDT0003, that is why it has it in bitmap but wasn&apos;t yet written on MDT0019 due to some reason but concurrent llog processing reading it.&lt;/p&gt;

&lt;p&gt;As for loop in code, llog_osd_next_block() reads llog up to its end, get last_idx in llog as #8244, since it is less than expected #8245, it does read again, but cycle exists because current offset is equal to llog_size at that moment causing -5 error. I am not yet sure what happens next, if caller would try read again and again then needed update should arrive at some moment and read would succeeded but we don&apos;t see that. So either write is just lost completely and never happens or caller doesn&apos;t retry that read properly. &lt;/p&gt;</comment>
                            <comment id="337815" author="tappro" created="Wed, 15 Jun 2022 12:12:57 +0000"  >&lt;p&gt;&lt;a href=&quot;https://review.whamcloud.com/47003&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47003&lt;/a&gt; - could solve read arriving prior related write&lt;br/&gt;
It is still unclear why/could write be lost at all, so read will retry loop forever&lt;/p&gt;

&lt;p&gt;So, at the moment the first problem to address is to prevent such llog processing loop if needed update is lost (presumably), at least we can&apos;t wait for it forever, probably we could check if there is related write inflight or so.&lt;/p&gt;

&lt;p&gt;Second problem to address is &lt;tt&gt;abort_recovery_mdt&lt;/tt&gt; option, being&#160; used it skips update recovery but doesn&apos;t remove related update logs, so if we have llog corruption situation which need to be solved, the &lt;tt&gt;abort_recovery_mdt&lt;/tt&gt; doesn&apos;t resolve it, every server remount would stuck on the same issues. I think if we decided to abort/skip update recovery there is no sense to keep update logs and they should be truncated. This is to be discussed, probably I am missing some cases when they must stay&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="337836" author="tappro" created="Wed, 15 Jun 2022 14:14:08 +0000"  >&lt;p&gt;speaking of &lt;tt&gt;abort_recovery_mdt&lt;/tt&gt; in the &lt;tt&gt;lod_sub_recovery_thread()&lt;/tt&gt; the cycle breaks only on &lt;tt&gt;abort_recovery&lt;/tt&gt; being set, so &lt;tt&gt;abort_recovery_mdt&lt;/tt&gt; doesn&apos;t make things better but also doesn&apos;t stop that thread. That could explain why setting &lt;tt&gt;abort_recovery_mdt&lt;/tt&gt; doesn&apos;t help with stopping stuck MDT-MDT recovery&lt;/p&gt;</comment>
                            <comment id="337857" author="adilger" created="Wed, 15 Jun 2022 16:03:54 +0000"  >&lt;p&gt;Ok, so two problems then. The llog is permanently short for whatever reason (posdibly old non-atomic writes?), so retrying forever will never fix it.  Having a few retries might be OK in case the write is in progress. Maybe we need a separate error code for a short read?&lt;/p&gt;

&lt;p&gt;Secondly, abort_revov_mdt should definitely abort the MDT recovery. It also makes sense that this will clear or reset or delete the recovery logs so that they don&apos;t try to be re-used on the next recovery. If recovery is already aborted and the filesystem is allowed to be used, then a later replay of old recovery logs may actually be bad. It also means that any problems in the recovery logs that are not &quot;self healing&quot; will be hit on every restart. &lt;/p&gt;</comment>
                            <comment id="337861" author="tappro" created="Wed, 15 Jun 2022 16:33:22 +0000"  >&lt;p&gt;yes, the same thoughts here. I want to figure out first why write was lost, if it wasn&apos;t committed, I&apos;d expect it to be replayed anyway. So probably retry is done not correctly, e.g. record id is wrong or something else, or that could be result of &lt;tt&gt;abort_recovery_mdt&lt;/tt&gt; attempt, which skips update recovery, i.e. updates are not re-applied and then it is stuck in retry for missing writes.&lt;/p&gt;

&lt;p&gt;Anyway, despite the reason I think retry mechanism should presume that write might be lost and stops at some moment. I think that is covered by Alex patch &lt;a href=&quot;https://review.whamcloud.com/#/c/47003/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/47003/&lt;/a&gt; already, need to discuss with him. For me it looks like with that patch there might be no need in retry at all, at least if we are sure that read error is returned from another server but is not network error.&lt;/p&gt;</comment>
                            <comment id="338408" author="gerrit" created="Wed, 22 Jun 2022 18:29:08 +0000"  >&lt;p&gt;&quot;Mike Pershin &amp;lt;mpershin@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/47698&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47698&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; lod: prevent endless retry in recovery thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 2c0135579c350a897313b4ff74a1cc7aea200ec7&lt;/p&gt;</comment>
                            <comment id="338410" author="tappro" created="Wed, 22 Jun 2022 18:36:56 +0000"  >&lt;p&gt;Initial approach which resolves endless retry loop we&apos;ve observed due to remote llog short read handling. Patch also makes obd_abort_recovery_mdt option to abort update recovery threads as well. That should prevent known endless retry cases and make possible manual intervention by using abort_recovery_mdt parameter if update recovery would stuck due to network problems.&lt;/p&gt;

&lt;p&gt;As noted above, more work is needed to remove update llogs upon abort_recovery_mdt setting and it is worth to think about limit for number of retries when remote server is not accessible, so far I have no idea what to choose as basis - recovery hard (or soft) timeout value maybe?&lt;/p&gt;</comment>
                            <comment id="339730" author="tappro" created="Wed, 6 Jul 2022 19:33:02 +0000"  >&lt;p&gt;OK, with more analysis, I check the &lt;tt&gt;llog_reader&lt;/tt&gt; and find out it doesn&apos;t report all inconsistencies. Modified version showed problem in update_log:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
# llog_reader update_log&#160;
rec #7707 type=106a0000 len=1160 offset 8670472
in bitmap: rec #8245 is set!
llog has 1 records but header count is 2
Header size : 32768 &#160; &#160; llh_size : 496
Time : Thu Apr &#160;7 12:52:40 2022
&lt;span class=&quot;code-object&quot;&gt;Number&lt;/span&gt; of records: 2 &#160; &#160;cat_idx: 9 &#160; &#160;last_idx: 8244
Target uuid :&#160;
-----------------------
#7707 (1160) id: 0 updatelog record master_transno:45507493823 batchid:37129103921 flags:0x0 u_index:0 u_count:11 p_count:18
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:create/1 params:2 p_0:0 p_1:1&#160;
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:ref_add/3 params:0&#160;
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:insert/10 params:3 p_0:2 p_1:3 p_2:4&#160;
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:insert/10 params:3 p_0:5 p_1:1 p_2:4&#160;
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:xattr_set/7 params:3 p_0:6 p_1:7 p_2:8&#160;
&#160; &#160; [0x1280002b0e:0x1be36:0x0] type:insert/10 params:3 p_0:9 p_1:3 p_2:10&#160;
&#160; &#160; [0x1280002b0e:0x1be36:0x0] type:ref_add/3 params:0&#160;
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:xattr_set/7 params:3 p_0:11 p_1:12 p_2:8&#160;
&#160; &#160; [0x1280002b0e:0x1be36:0x0] type:attr_set/5 params:1 p_0:13&#160;
&#160; &#160; [0x1840003b3b:0x2fce:0x0] type:xattr_set/7 params:3 p_0:14 p_1:15 p_2:8&#160;
&#160; &#160; [0x200000001:0x15:0x0] type:write/12 params:2 p_0:16 p_1:17&#160;
&#160; &#160; p_0 - 208/\x8E070000000000000000000000000000000000000000000000000000000000000000000000000000005ob\x00000000005ob\x00000000005ob\x0000
&#160; &#160; p_1 - 16/\x0E+\x0080120000006\xBE0100000000
&#160; &#160; p_2 - 2/.
&#160; &#160; p_3 - 0/
&#160; &#160; p_4 - 0/
&#160; &#160; p_5 - 16384/\x0000000000000400jb\x0000000000@\x0000000000000300jb\x00000000..\x0000000000000C00000000000000trusted.dmv\x00d\x0000000\x00
&#160; &#160; p_6 - 0/
&#160; &#160; p_7 - 0/
&#160; &#160; p_8 - 0/
&#160; &#160; p_9 - 0/
&#160; &#160; p_10 - 0/
&#160; &#160; p_11 - 0/
&#160; &#160; p_12 - 0/
&#160; &#160; p_13 - 0/
&#160; &#160; p_14 - 0/
&#160; &#160; p_15 - 0/
&#160; &#160; p_16 - 0/
&#160; &#160; p_17 - 0/
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;this is correct output, llog has only one record #7707, but its bitmap has also bit #8245 set, that is why count is 2 and that is why retry doesn&apos;t help, it reads the same bitmap and llog data again and again. I will think how to inject such corruption for test purposes and will add modified llog_reader as well&lt;/p&gt;</comment>
                            <comment id="340128" author="gerrit" created="Tue, 12 Jul 2022 06:48:55 +0000"  >&lt;p&gt;&quot;Mike Pershin &amp;lt;mpershin@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/47934&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47934&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: llog_reader to detect more corruptions&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: e6f90aa1b8234121d0fc03ce12f98268ff3fcd29&lt;/p&gt;</comment>
                            <comment id="340743" author="gerrit" created="Mon, 18 Jul 2022 20:25:24 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/47698/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47698/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; lod: prevent endless retry in recovery thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 1a24dcdce121787428ea820561cfa16ae24bdf82&lt;/p&gt;</comment>
                            <comment id="342391" author="gerrit" created="Wed, 3 Aug 2022 12:47:46 +0000"  >&lt;p&gt;&quot;Mike Pershin &amp;lt;mpershin@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48112&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48112&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: more checks in llog_reader&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 4b619366a07813394cfb7abf1d79bb9512605401&lt;/p&gt;</comment>
                            <comment id="342954" author="gerrit" created="Mon, 8 Aug 2022 19:53:53 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/47934/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47934/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: llog_reader to detect more corruptions&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: d914a5b7a49ac6b61c0191a0966d1f684a6957b6&lt;/p&gt;</comment>
                            <comment id="344243" author="gerrit" created="Mon, 22 Aug 2022 14:57:12 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48286&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48286&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; lod: prevent endless retry in recovery thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7e2533728b5d574ec8638742cbfe574580c0a063&lt;/p&gt;</comment>
                            <comment id="344351" author="gerrit" created="Tue, 23 Aug 2022 09:29:18 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48310&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48310&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: llog_reader to detect more corruptions&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 02ea0e325eabc57d95051e79ffe1cc87c2243ced&lt;/p&gt;</comment>
                            <comment id="344664" author="gerrit" created="Thu, 25 Aug 2022 16:11:39 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48341&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48341&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: Fix chunk re-read case in llog_process_thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a7015dccd3e960516c95510663626f075191d4bd&lt;/p&gt;</comment>
                            <comment id="353788" author="gerrit" created="Tue, 22 Nov 2022 04:22:47 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48112/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48112/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: more checks in llog_reader&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 386ffcdbb4c9b89f798de4c83a51a3f020542c8b&lt;/p&gt;</comment>
                            <comment id="355616" author="pjones" created="Thu, 8 Dec 2022 00:14:59 +0000"  >&lt;p&gt;All patches seem to have merged for 2.16&lt;/p&gt;</comment>
                            <comment id="374520" author="gerrit" created="Mon, 5 Jun 2023 16:11:00 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51217&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51217&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; lod: prevent endless retry in recovery thread&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: d66b517c9207dae3dd6da75266e78e50dfbc3f93&lt;/p&gt;</comment>
                            <comment id="374521" author="gerrit" created="Mon, 5 Jun 2023 16:11:01 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51218&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51218&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: llog_reader to detect more corruptions&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: e43bf0086e8f80f128cf868b5dca6079872f6a62&lt;/p&gt;</comment>
                            <comment id="374523" author="gerrit" created="Mon, 5 Jun 2023 16:11:02 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51220&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51220&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15938&quot; title=&quot;MDT recovery did not finish due to corrupt llog record&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15938&quot;&gt;&lt;del&gt;LU-15938&lt;/del&gt;&lt;/a&gt; llog: more checks in llog_reader&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a0d25b76f6d41e164536a1c1cd46d503338643e7&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="70715">LU-15937</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="69804">LU-15761</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="69092">LU-15645</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="70712">LU-15934</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="63086">LU-15139</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="72621">LU-16203</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="71519">LU-16052</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="44018" name="2022-06-12_15-24-23__DDN-3093_shalustre-lfs02-n26_mdt0019_mdt0003_update_log" size="9277744" author="dvensko" created="Sun, 12 Jun 2022 16:02:52 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02ryf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>