<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:02:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13535] Files truncated/corruption due to lfsck</title>
                <link>https://jira.whamcloud.com/browse/LU-13535</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Following several server crashes (eg.&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13511&quot; title=&quot;MDS 2.12.4 ASSERTION( top-&amp;gt;loh_hash.next == ((void *)0) &amp;amp;&amp;amp; top-&amp;gt;loh_hash.pprev == ((void *)0) ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13511&quot;&gt;&lt;del&gt;LU-13511&lt;/del&gt;&lt;/a&gt;) when running &lt;tt&gt;lfs migrate&lt;/tt&gt;,&#160;we decided to run lfsck on Fir (Lustre 2.12.4). Today, users are reporting that some of their files have been truncated to 128MB (strangely the size of the first component matches the one from our new default PFL layout).&lt;/p&gt;

&lt;p&gt;What led to this situation is likely the following scenario:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;files were created originally using DoM + PFL (default setup)&lt;/li&gt;
	&lt;li&gt;we changed our default layout to PFL with the first OST component set to 128MB (stripe count 1) to avoid new DoM files&lt;/li&gt;
	&lt;li&gt;because of issues with DoM, we have restriped most of the existing DoM files using &lt;tt&gt;lfs migrate -c 1&lt;/tt&gt; (DoM/PFL to plain layout) this was done several months ago without problems&lt;/li&gt;
	&lt;li&gt;two days ago, we started to run lfsck namespace + layout&lt;/li&gt;
	&lt;li&gt;today, users are reporting truncated files, only the ones with plain layout &amp;gt; 128MB&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I&apos;m wondering if this could be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13426&quot; title=&quot;&amp;quot;lfs migrate&amp;quot; on DoM component clobbers LOV EA FID&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13426&quot;&gt;&lt;del&gt;LU-13426&lt;/del&gt;&lt;/a&gt;. We consider this issue Sev 2 at least as lfsck is likely corrupting files that have been migrated to plain layout.&lt;/p&gt;

&lt;p&gt;More information below.&lt;/p&gt;

&lt;p&gt;Example with file:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;/fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 ~]# stat /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
  File: &#8216;/fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa&#8217;
  Size: 134217728 	Blocks: 262152     IO Block: 4194304 regular file
Device: e64e03a8h/3863872424d	Inode: 144119811155193635  Links: 1
Access: (0644/-rw-r--r--)  Uid: (65488/ mgebala)   Gid: (52067/astraigh)
Access: 2020-05-07 11:18:32.000000000 -0700
Modify: 2020-04-08 23:24:19.000000000 -0700
Change: 2020-04-29 11:26:53.000000000 -0700
 Birth: -
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 ~]# lfs getstripe /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
/fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
lmm_stripe_count:  1
lmm_stripe_size:   4194304
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 80
	obdidx		 objid		 objid		 group
	    80	      17475505	    0x10aa7b1	  0x1700000402
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;FID is: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x200043465:0x6f23:0x0&amp;#93;&lt;/span&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 ~]# lfs path2fid /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
[0x200043465:0x6f23:0x0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Thanks to Robinhood, we know that the file size was ~132MB and not 128MB.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;MariaDB [robinhood_fir]&amp;gt; select * from ENTRIES where id=&apos;0x200043465:0x6f23:0x0&apos;;
+------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
| id                     | uid     | gid      | size      | blocks | creation_time | last_access | last_mod   | last_mdchange | type | mode | nlink | md_update  | invalid | fileclass | class_update | alert_status | checkdv_status | alert_lstchk | alert_lstalrt | checkdv_lstchk | checkdv_lstsuc | checkdv_out            |
+------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
| 0x200043465:0x6f23:0x0 | mgebala | astraigh | 138323718 | 270176 |    1586607743 |  1586413465 | 1586413459 |    1588184813 | file |  420 |     1 | 1588185083 |       0 | +groups+  |   1588185083 |              | ok             |            0 |             0 |     1588184813 |     1588184813 | 60239190574:1586607743 |
+------------------------+---------+----------+-----------+--------+---------------+-------------+------------+---------------+------+------+-------+------------+---------+-----------+--------------+--------------+----------------+--------------+---------------+----------------+----------------+------------------------+
1 row in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Also the original data_version was 60239190574 but now it&apos;s:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 ~]# lfs data_version /fir/groups/astraigh/Magda/fachinettie19/fachinetti_CC_DLD/extracted/fachinetti_CCr1oCAr1.k25.ci10.madx5.r1.singleline.fa
30120416758
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This file is on MDT0 and lfsck logs show that something was fixed for this FID &lt;tt&gt;0x200043465:0x6f23:0x0&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 ~]# grep 0x200043465:0x6f23:0x0 lfsck.fir-md1-s1.log 
00100000:10000000:24.0:1588797550.743684:0:126810:0:(lfsck_layout.c:4033:lfsck_layout_repair_owner()) fir-MDT0000-osd: layout LFSCK assistant repaired inconsistent file owner for: parent [0x200043465:0x6f23:0x0], child [0x1340000401:0x10bc4c3:0x0], OST-index 65, stripe-index 1, old owner 0/0, new owner 65488/52067: rc = 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Robinhood also shows that the file was previously stripped on two OSTs, but Robinhood doesn&apos;t support DoM or migration, so that is from the original striping info:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;MariaDB [robinhood_fir]&amp;gt; select * from STRIPE_ITEMS where id=&apos;0x200043465:0x6f23:0x0&apos;;
+------------------------+--------------+--------+----------------------+
| id                     | stripe_index | ostidx | details              |
+------------------------+--------------+--------+----------------------+
       |43465:0x6f23:0x0 |            0 |     64 |          ??
| 0x200043465:0x6f23:0x0 |            1 |     65 |      @   ??
                                                                     |
+------------------------+--------------+--------+----------------------+
2 rows in set (0.00 sec)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;LFSCK layout has fixed many files like that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL &apos;tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout&apos; | grep status
fir-md1-s[1-4]: status: completed
[root@fir-hn01 sthiell.root]# clush -w@mds -R exec -bL &apos;tgt=$(printf fir-MDT%%04x %n); ssh %h lctl get_param -n mdd.$tgt.lfsck_layout&apos; | grep repaired
fir-md1-s[1,4]: repaired_dangling: 0
fir-md1-s[2-3]: repaired_dangling: 1
fir-md1-s[1-4]: repaired_unmatched_pair: 0
fir-md1-s[1-4]: repaired_multiple_referenced: 0
fir-md1-s[1-4]: repaired_orphan: 0
fir-md1-s1: repaired_inconsistent_owner: 10494922
fir-md1-s2: repaired_inconsistent_owner: 26336224
fir-md1-s3: repaired_inconsistent_owner: 36300505
fir-md1-s4: repaired_inconsistent_owner: 15102845
fir-md1-s1: repaired_others: 429814
fir-md1-s2: repaired_others: 46955127
fir-md1-s3: repaired_others: 0
fir-md1-s4: repaired_others: 1716650
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Do you confirm this could be due to LFSCK? I&apos;m not sure why &quot;inconsistent file owner&quot; would corrupt files, but this is the only pointer that we have now. If that&apos;s the case, do you think there is a way to repair what LFSCK has &quot;fixed&quot;?&lt;/p&gt;</description>
                <environment>CentOS 7.6</environment>
        <key id="59092">LU-13535</key>
            <summary>Files truncated/corruption due to lfsck</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Thu, 7 May 2020 18:51:43 +0000</created>
                <updated>Fri, 9 Jul 2021 17:43:14 +0000</updated>
                            <resolved>Wed, 27 May 2020 14:52:08 +0000</resolved>
                                    <version>Lustre 2.12.4</version>
                                    <fixVersion>Lustre 2.14.0</fixVersion>
                    <fixVersion>Lustre 2.12.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="269597" author="sthiell" created="Thu, 7 May 2020 19:15:20 +0000"  >&lt;p&gt;Attached debug logs with lfsck for fir-MDT0000 as&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/34860/34860_dk.fir-md1-s1.log.gz&quot; title=&quot;dk.fir-md1-s1.log.gz attached to LU-13535&quot;&gt;dk.fir-md1-s1.log.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; and fir-MDT0001 as&#160;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/34861/34861_dk.fir-md1-s2.log.gz&quot; title=&quot;dk.fir-md1-s2.log.gz attached to LU-13535&quot;&gt;dk.fir-md1-s2.log.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;  The example file in the case is located on fir-MDT0000.&lt;/p&gt;</comment>
                            <comment id="269630" author="sthiell" created="Fri, 8 May 2020 01:53:41 +0000"  >&lt;p&gt;We&apos;re not sure anymore if all of these files were originally created with DoM actually. More and more users are reporting truncated files. It seems like users are even reporting truncated files that have been created recently with the non-DoM, PFL config, but only the first stripe (128MiB) remains after LFSCK was run. What seems to be a common cause could be that the parent directories of these files have recently been migrated to another MDT (MDT1). Could a &lt;tt&gt;lfs migrate -m&lt;/tt&gt; followed by a later lfsck_layout be able to truncate PFL files like that to plain layout (with the first component only)?&lt;/p&gt;

&lt;p&gt;Our default PFL config:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 ~]# lfs getstripe -d /fir
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   134217728
      stripe_count:  1       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 134217728
    lcme_extent.e_end:   137438953472
      stripe_count:  2       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 137438953472
    lcme_extent.e_end:   EOF
      stripe_count:  4       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="269706" author="pjones" created="Fri, 8 May 2020 17:45:44 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please advise&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="269821" author="tappro" created="Mon, 11 May 2020 14:34:51 +0000"  >&lt;p&gt;yes, I am working at that&lt;/p&gt;</comment>
                            <comment id="269852" author="sthiell" created="Mon, 11 May 2020 18:34:11 +0000"  >&lt;p&gt;Thanks, Mike. This seems pretty bad.&lt;/p&gt;

&lt;p&gt;It looks like all files in some MDT-migrated directories have lost their PFL layout after running LFSCK. They just seem to have a plain layout now.  Files &amp;lt; 128MiB (size of our first PFL component) are not truncated and are still usable, but with a plain layout, but the larger files are truncated. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Example of previously PFL&apos;ed small file, that now has a plain layout:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
[root@fir-rbh01 job034]# lfs getstripe /fir/users/alpays/hongli-backup/GCGR/relion_gcgr_vpp_20180212_tem4/Class2D/job034/run_it025_optimiser.star
/fir/users/alpays/hongli-backup/GCGR/relion_gcgr_vpp_20180212_tem4/Class2D/job034/run_it025_optimiser.star
lmm_stripe_count:  1
lmm_stripe_size:   4194304
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 57
	obdidx		 objid		 objid		 group
	    57	      13579585	     0xcf3541	  0x1140000400
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Its directory still has the PFL config:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 job034]# lfs getstripe -d /fir/users/alpays/hongli-backup/GCGR/relion_gcgr_vpp_20180212_tem4/Class2D/job034
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   134217728
      stripe_count:  1       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 134217728
    lcme_extent.e_end:   137438953472
      stripe_count:  2       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 137438953472
    lcme_extent.e_end:   EOF
      stripe_count:  4       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;How is that possible?&lt;/p&gt;</comment>
                            <comment id="269892" author="pjones" created="Tue, 12 May 2020 00:05:00 +0000"  >&lt;p&gt;Details from email&lt;/p&gt;

&lt;p&gt;&quot;I&#8217;m contacting you regarding &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13535&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.whamcloud.com/browse/LU-13535&lt;/a&gt; &quot;Files truncated/corruption due to lfsck&#8221;&lt;/p&gt;

&lt;p&gt;We wanted to send you this email to provide updated information about our specific situation.&lt;/p&gt;

&lt;p&gt;A single LFSCK run has changed the layout of millions of files on Fir to plain layout of 1 OST, with 128 MiB maximum. Larger files have been truncated, so today we know that lfsck has corrupted the content of about 215k files.  At this state, it looks like it has only happened on directories which have previously been migrated to another MDT (lfs migrate -m 1), directories which worked fine until we ran LFSCK namespace + layout.&lt;/p&gt;

&lt;p&gt;We have indeed scanned the filesystem (Fir on Sherlock) for files that are 128MiB in size with a plain layout of 1 OST, which is our way to detect when a file has lost its default PFL layout and very, very likely been truncated by LFSCK last week.  During the weekend, we have been running lfs find -size 128M -c 1 on the whole filesystem (665M inodes) and it has completed: we have found 214,695 files total that have been truncated to 128MiB after this LFSCK run. All files I checked manually have indeed been truncated/corrupted. Also, users are reporting that their quota is still showing the previous volume used, so we think there could be a chance that that the objects are still somehow on the OSTs. Some users have lost tens of TB of scratch research data due to very large files being truncated.&lt;/p&gt;

&lt;p&gt;Thanks for assigning Mike to this ticket. Any insights would be appreciated as soon as possible so we can adjust the communication to our users. My guess is that the layouts are lost, but perhaps you will find a way to reattach the component to these files?&quot;&lt;/p&gt;</comment>
                            <comment id="269918" author="tappro" created="Tue, 12 May 2020 11:36:55 +0000"  >&lt;p&gt;Stephane, could you please get extended output of striping info from affected files via &lt;tt&gt;lfs getstripe -R -v&lt;/tt&gt;, it will show all fields in layout and can give some clues. I am trying to reproduce that behavior locally and also inspect &lt;tt&gt;lfsck&lt;/tt&gt; code in 2.12.4 right now&lt;/p&gt;

&lt;p&gt;Also please provide exact &lt;tt&gt;lfsck&lt;/tt&gt; command used&lt;/p&gt;</comment>
                            <comment id="269923" author="tappro" created="Tue, 12 May 2020 13:53:01 +0000"  >&lt;p&gt;I was managed to reproduce that bug and have found why it happens, fix for lfsck is on the way. I am trying to figure out now what can be done for lost stripes.&lt;/p&gt;</comment>
                            <comment id="269947" author="sthiell" created="Tue, 12 May 2020 16:03:53 +0000"  >&lt;p&gt;Thanks Mike, this is great news that you were able to reproduce yourself! Let us know if you find a way to reattach the lost stripes, we have moved/quarantined the files into directories using the same project IDs so the FIDs should be the same.&lt;/p&gt;

&lt;p&gt;I&apos;m attaching the output of &lt;tt&gt;lfs getstripe -R -v&lt;/tt&gt; on all affected files (the truncated ones only) as  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/34894/34894_fir_lfsck_trunc_getstripe_all.log.gz&quot; title=&quot;fir_lfsck_trunc_getstripe_all.log.gz attached to LU-13535&quot;&gt;fir_lfsck_trunc_getstripe_all.log.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;

&lt;p&gt;As for lfsck, we started it with &lt;tt&gt;lctl lfsck_start -M fir-MDTxxx -t namespace&lt;/tt&gt; first on all 4 MDTs, and then once done, I did &lt;tt&gt;lctl lfsck_start -M fir-MDTxxx -t layout&lt;/tt&gt; on all 4 MDTs.&lt;/p&gt;</comment>
                            <comment id="269983" author="gerrit" created="Tue, 12 May 2020 21:33:26 +0000"  >&lt;p&gt;Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38584&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38584&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13535&quot; title=&quot;Files truncated/corruption due to lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13535&quot;&gt;&lt;del&gt;LU-13535&lt;/del&gt;&lt;/a&gt; lfsck: fix possible PFL layout corruption&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: db1aa8f1880162e186467a9a52da21fb319cb1b2&lt;/p&gt;</comment>
                            <comment id="269984" author="gerrit" created="Tue, 12 May 2020 21:38:37 +0000"  >&lt;p&gt;Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38585&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38585&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13535&quot; title=&quot;Files truncated/corruption due to lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13535&quot;&gt;&lt;del&gt;LU-13535&lt;/del&gt;&lt;/a&gt; lfsck: fix possible PFL layout corruption&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a29eecf94a2fb0642256e600074173428ccf5304&lt;/p&gt;</comment>
                            <comment id="270027" author="sthiell" created="Wed, 13 May 2020 05:20:46 +0000"  >&lt;p&gt;Hi Mike &#8211; This is really awesome that you seem to have found the source of the problem! Congrats and many thanks for that! For our truncated files, is there any chance that the old composite layout is still around somewhere? We have been working today with our users and hopefully most of the truncated files are scratch files that can be regenerated, but still, unfortunately, a few of them were not transferred from this filesystem to longer-term storage and we would like to know if they could somehow still be &quot;fixed&quot;. Thx.&lt;/p&gt;</comment>
                            <comment id="270061" author="tappro" created="Wed, 13 May 2020 14:58:15 +0000"  >&lt;p&gt;Stephane, was layout of these files FS-default, so all have the same one or there are many cases? &lt;/p&gt;</comment>
                            <comment id="270087" author="sthiell" created="Wed, 13 May 2020 17:00:21 +0000"  >&lt;p&gt;Prior to the lfsck layout incident, these files were likely all using the PFL layout defined by their parent directories. We have set up the following PFL layout on all directories, and then it&apos;s inherited for new directories (as I don&apos;t think we can set a PFL layout as FS-default):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs setstripe -E 128M -c 1 -S 4M -E 128G -c 2 -S 4M -E -1 -c 4 -S 4M /fir
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The FS-default is still plain layout of 1 stripe, as we haven&apos;t modified it. With tunefs.lustre on MDT0 I can see:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lov.stripecount=1 lov.stripesize=1048576 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="270604" author="sthiell" created="Tue, 19 May 2020 21:22:34 +0000"  >&lt;p&gt;We have been able to recover the composite layout of all our truncated files thanks to Mike! Feel free to close this ticket once the lfsck patch has landed (in 2.12 also please!!).&lt;/p&gt;</comment>
                            <comment id="271224" author="gerrit" created="Wed, 27 May 2020 05:03:33 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38584/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38584/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13535&quot; title=&quot;Files truncated/corruption due to lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13535&quot;&gt;&lt;del&gt;LU-13535&lt;/del&gt;&lt;/a&gt; lfsck: fix possible PFL layout corruption&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: be009cb4a73b3bef7302083bec7d1d6289d515b7&lt;/p&gt;</comment>
                            <comment id="271290" author="pjones" created="Wed, 27 May 2020 14:52:08 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                            <comment id="271317" author="gerrit" created="Wed, 27 May 2020 17:29:56 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38585/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38585/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13535&quot; title=&quot;Files truncated/corruption due to lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13535&quot;&gt;&lt;del&gt;LU-13535&lt;/del&gt;&lt;/a&gt; lfsck: fix possible PFL layout corruption&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 775ce1c26c843d9ef9e6919f85e5284828762095&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="65057">LU-14837</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="59407">LU-13619</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="54984">LU-12013</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="34860" name="dk.fir-md1-s1.log.gz" size="10428036" author="sthiell" created="Thu, 7 May 2020 19:11:40 +0000"/>
                            <attachment id="34861" name="dk.fir-md1-s2.log.gz" size="9423307" author="sthiell" created="Thu, 7 May 2020 19:13:53 +0000"/>
                            <attachment id="34894" name="fir_lfsck_trunc_getstripe_all.log.gz" size="6819993" author="sthiell" created="Tue, 12 May 2020 15:58:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00zsn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>