<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:25:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2440] git repositories get corrupted</title>
                <link>https://jira.whamcloud.com/browse/LU-2440</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I have received notice of an issue a user at TACC is having:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am having a problem where my git repositories on TACC systems get corrupted, leading to errors like this:&lt;/p&gt;&lt;/blockquote&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;login4$ git status
fatal: index file smaller than expected
login4$ git status
error: object file .git/objects/01/ee9f4bfe74aaee027a3e8418d70d337e1235d3
is empty
fatal: loose object 01ee9f4bfe74aaee027a3e8418d70d337e1235d3 (stored in
.git/objects/01/ee9f4bfe74aaee027a3e8418d70d337e1235d3) is corrupt
login4$ git status
error: object file .git/objects/8d/6083737dae5cb67906ac26702465ca2d70bc95
is empty
fatal: loose object 8d6083737dae5cb67906ac26702465ca2d70bc95 (stored in
.git/objects/8d/6083737dae5cb67906ac26702465ca2d70bc95) is corrupt
login4$ git status
error: object file .git/objects/bc/61c57143652fbf198de898ca7bb9d5659a5de0
is empty
fatal: loose object bc61c57143652fbf198de898ca7bb9d5659a5de0 (stored in
.git/objects/bc/61c57143652fbf198de898ca7bb9d5659a5de0) is corrupt
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>EL5</environment>
        <key id="16864">LU-2440</key>
            <summary>git repositories get corrupted</summary>
                <type id="6" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11315&amp;avatarType=issuetype">Story</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="rhenwood">Richard Henwood</assignee>
                                    <reporter username="rhenwood">Richard Henwood</reporter>
                        <labels>
                    </labels>
                <created>Thu, 6 Dec 2012 16:04:56 +0000</created>
                <updated>Tue, 30 May 2017 05:21:40 +0000</updated>
                            <resolved>Tue, 30 May 2017 05:21:40 +0000</resolved>
                                    <version>Lustre 1.8.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="48885" author="adilger" created="Thu, 6 Dec 2012 18:04:11 +0000"  >&lt;p&gt;If the git objects are zero length, there could be a number of potential causes.  I had a similar problem on my local filesystem due to it filling up, and then Git proceeded to create and reference new objects that were empty.&lt;/p&gt;

&lt;p&gt;Is there possibly a quota limit in effect, some OST being full, other?  Alternately, if git was in the middle of updating and the client crashed or was evicted before the data was written to disk something similar could also happen.&lt;/p&gt;

&lt;p&gt;Conversely, if there are errors on the console (dmesg) for that client when these objects are accessed that would clearly indicate some problem with Lustre.&lt;/p&gt;

&lt;p&gt;Is the problem reproducible (i.e. some sequence of steps during normal operation results in a broken repository)?  At one time in the distant past, there was a metadata bug that CVS triggered, and we have a CVS test as the reproducer (sanity.sh test_99&lt;span class=&quot;error&quot;&gt;&amp;#91;a-f&amp;#93;&lt;/span&gt;).  If we can distill the steps that git takes to reproduce this problem (preferably starting with a new, empty archive to save space), that would definitely speed up understanding and fixing any problem found.&lt;/p&gt;</comment>
                            <comment id="51094" author="brian" created="Thu, 24 Jan 2013 08:50:15 +0000"  >&lt;p&gt;It is probably worth mentioning that there is a nice long thread on the git mailing list about other issues of corruption and just general mis-behaviour of git on Lustre.  At least some of it might be due to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2276&quot; title=&quot;is open() idempotent in regards to being restarted after a signal interrupts it?&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2276&quot;&gt;&lt;del&gt;LU-2276&lt;/del&gt;&lt;/a&gt;, and yet some other complaints seem to sound like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-305&quot; title=&quot;utime() fails with EINTR : not conform to POSIX standard&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-305&quot;&gt;&lt;del&gt;LU-305&lt;/del&gt;&lt;/a&gt; and some others unknown as of yet.&lt;/p&gt;

&lt;p&gt;Perhaps using git on lustre ought to be one of our regression tests.&lt;/p&gt;</comment>
                            <comment id="51110" author="rhenwood" created="Thu, 24 Jan 2013 12:10:44 +0000"  >&lt;p&gt;Thanks Brian, this is useful.&lt;/p&gt;

&lt;p&gt;I&apos;m playing around at the moment, trying to create a reproducer.&lt;/p&gt;</comment>
                            <comment id="51732" author="mboisson" created="Mon, 4 Feb 2013 14:01:02 +0000"  >&lt;p&gt;One of our users has what seems to be the exact same problem with git. We are running lustre clients 1.8.8, and 1.8.4 and 1.8.5 servers. The user is able to reproduce the problem every now and then by running a &quot;git gc&quot; in a crontab. The problem seems to appear once or twice per week.&lt;/p&gt;</comment>
                            <comment id="51737" author="rhenwood" created="Mon, 4 Feb 2013 14:22:58 +0000"  >&lt;p&gt;Thanks for this update. I have been able to reproduce this issue, leaving a repo overnight did it in my case.&lt;/p&gt;

&lt;p&gt;I&apos;m following up to see if this is also an issue with Master, and trying to shorten the time to reproduce.&lt;/p&gt;</comment>
                            <comment id="51811" author="adilger" created="Tue, 5 Feb 2013 14:45:35 +0000"  >&lt;p&gt;Richard,&lt;br/&gt;
&quot;overnight&quot; might just be the time it takes for idle DLM locks to be cancelled.  What would be useful is:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;enable full debug logging, like &lt;tt&gt;lctl set_param debug=-1&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;do &quot;git update&quot; or &quot;git gc&quot; or whatever is the trigger&lt;/li&gt;
	&lt;li&gt;dump debug logs, like &lt;tt&gt;lctl dk /tmp/git_update.log&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;verify repository is not corrupted &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/help_16.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/li&gt;
	&lt;li&gt;get checksums of all of the files under .git, like &lt;tt&gt;find .git -type f | xargs md5sum &amp;gt; git_before.md5sum&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;cancel all of the DLM locks on the client, like &lt;tt&gt;lctl set_param ldlm.namespaces.*.lru_size=clear&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;dump debug logs, like {{lctl dk /tmp/git_dlm_cancel.log&lt;/li&gt;
	&lt;li&gt;get checksums of all the .git files again (into a new file)&lt;/li&gt;
	&lt;li&gt;compare checksums of before and after lock cancel&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;If the checksums are different, then there is some problem with the cache flushing or similar.&lt;/p&gt;

&lt;p&gt;However, without a more specific reproducer, it won&apos;t be very easy to isolate when this is happening.&lt;/p&gt;</comment>
                            <comment id="52763" author="rhenwood" created="Wed, 20 Feb 2013 14:41:24 +0000"  >&lt;p&gt;I&apos;ve seen a transient corruption:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ git fsck
error: 6419686540529fe8937aa6a7f01989109c7be7c6: object corrupt or missing
error: af074dd53d04eda6d5db0f2368f0390d4060ca70: object corrupt or missing
error: ffdec01d82af9ca59bec7b1fdf941e4a8d84db2e: object corrupt or missing
fatal: index file smaller than expected
login2$
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ diff ~/git_before.md5sum ~/git_after.md5sum
0a1
&amp;gt; 2b1bc9e225f27e10a228974b47281ff9  .git/refs/heads/master
20a22,24
&amp;gt; 122257de7cf6016e026760c7791d1d5a  .git/objects/af/074dd53d04eda6d5db0f2368f0390d4060ca70
&amp;gt; be5e8c66fd02b1d9662a6deb2d4d325a  .git/objects/64/19686540529fe8937aa6a7f01989109c7be7c6
&amp;gt; 9963ce604411643548921a66fc0a67d2  .git/objects/ff/dec01d82af9ca59bec7b1fdf941e4a8d84db2e
29,32c33,36
&amp;lt; 1414af68fbd29b3dafa9152a49453010  .git/logs/refs/heads/master
&amp;lt; 1414af68fbd29b3dafa9152a49453010  .git/logs/HEAD
&amp;lt; fde41e17523926db4a56131b9a313c54  .git/COMMIT_EDITMSG
&amp;lt; 99e1f2253855d6cf020dce0fff06fdfd  .git/index
---
&amp;gt; 1f8f81bd507eac9467924aab7cbe9995  .git/logs/refs/heads/master
&amp;gt; 1f8f81bd507eac9467924aab7cbe9995  .git/logs/HEAD
&amp;gt; 42bee3bb1f71aec0b3e61f0fcf4f65d6  .git/COMMIT_EDITMSG
&amp;gt; a73a56d1dc1af89cbcb0abc836864f82  .git/index
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and then try git fsck again:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;login2$ git fsck
login2$
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I notice the difference here is &lt;tt&gt;object corrupt or missing&lt;/tt&gt; compared to the reported &lt;tt&gt;$git status ... loose object ... is corrupt&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;NOTE: These results are from a machine where &lt;tt&gt;sync&lt;/tt&gt; is not available to the user.&lt;/p&gt;</comment>
                            <comment id="57310" author="mboisson" created="Tue, 30 Apr 2013 13:24:43 +0000"  >&lt;p&gt;Hi,&lt;br/&gt;
Just as a note, we had one user who had this problem. About 6 weeks ago, our sysadmin increased two parameters on the lustre clients :&lt;br/&gt;
LRU&lt;br/&gt;
MaxDirtyMegabytes&lt;/p&gt;

&lt;p&gt;He increased the LRU to 10 000, and MaxDirtyMegabytes to 256MB. &lt;/p&gt;

&lt;p&gt;Since then, our user did not get an error.&lt;/p&gt;

&lt;p&gt;Might be worth investigating those parameters.&lt;/p&gt;

&lt;p&gt;Regards,&lt;/p&gt;

&lt;p&gt;Maxime Boissonneault&lt;/p&gt;</comment>
                            <comment id="57325" author="rhenwood" created="Tue, 30 Apr 2013 15:38:51 +0000"  >&lt;p&gt;Thanks for this info Maxime.&lt;/p&gt;

&lt;p&gt;Can you provide the version of Lustre you observed this problem with, the mount options and if sync is enabled on your machine.&lt;/p&gt;

&lt;p&gt;If you can readily reproduce this problem - and can share a reproducible configuration - that would be very helpful.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
Richard&lt;/p&gt;</comment>
                            <comment id="57326" author="mboisson" created="Tue, 30 Apr 2013 15:45:32 +0000"  >&lt;p&gt;Hi Richard,&lt;br/&gt;
Here is the&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;mboisson@colosse2 ~&amp;#93;&lt;/span&gt;$ cat /proc/fs/lustre/version&lt;br/&gt;
lustre: 1.8.8&lt;br/&gt;
kernel: patchless_client&lt;br/&gt;
build:  jenkins-wc1-gbc88c4c-PRISTINE-2.6.18-308.16.1.el5&lt;/p&gt;

&lt;p&gt;The mount options are : &lt;br/&gt;
mds2-ib0@o2ib,mds1-ib0@o2ib:/lustre1 on /lustre type lustre (rw,noauto,localflock)&lt;br/&gt;
mds4-ib0@o2ib,mds3-ib0@o2ib:/lustre2 on /lustre2 type lustre (rw,noauto,localflock)&lt;br/&gt;
10.225.16.3@o2ib0:/fs1 on /lustre3 type lustre (rw,noauto,localflock)&lt;/p&gt;

&lt;p&gt;How do I know if sync is enabled ?&lt;/p&gt;

&lt;p&gt;We can not readily reproduce the problem. First, it does not seem to happen anymore, and second, when it did happen, it was at random times.&lt;/p&gt;

&lt;p&gt;Best,&lt;/p&gt;

&lt;p&gt;Maxime&lt;/p&gt;</comment>
                            <comment id="57573" author="rhenwood" created="Thu, 2 May 2013 19:12:13 +0000"  >&lt;p&gt;on the topic of sync: I&apos;ve seen clients on some systems with /bin/sync as only executable by root.&lt;/p&gt;</comment>
                            <comment id="57664" author="mboisson" created="Fri, 3 May 2013 17:35:38 +0000"  >&lt;p&gt;Hi Richard,&lt;br/&gt;
/bin/sync can be executed by others on our system.&lt;/p&gt;

&lt;p&gt;Best,&lt;/p&gt;

&lt;p&gt;Maxime&lt;/p&gt;</comment>
                            <comment id="197514" author="adilger" created="Tue, 30 May 2017 05:21:40 +0000"  >&lt;p&gt;Close old issue.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="16567">LU-2276</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10782">LU-305</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvdcn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5768</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>