<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:44:26 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4626] directories missing after upgrade from 1.8 to 2.3 then 2.4.1 then 2.4.2</title>
                <link>https://jira.whamcloud.com/browse/LU-4626</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;we have got a test file system which had been created with Lustre 1.8 (or even 1.6), then briefly updated to 2.3, 2.4.1 and now to 2.4.2. On this file system we now have a few directories that are inaccessible after the latest upgrade. I believe they were accessible when we were still running 2.4.1 but I&apos;m not sure.&lt;/p&gt;

&lt;p&gt;All clients are currently running 1.8.9.&lt;/p&gt;

&lt;p&gt;Trying to ls one of the directories does generate an error on the command line, but nothing in any of the system logs that I could find.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;bnh65367@p60-storage ~&amp;#93;&lt;/span&gt;$ ls -l /mnt/play01 |grep p60&lt;br/&gt;
ls: cannot access /mnt/play01/p45: No such file or directory&lt;br/&gt;
ls: cannot access /mnt/play01/p60: No such file or directory&lt;br/&gt;
d??????????  ? ?           ?                 ?            ? p60&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;bnh65367@p60-storage ~&amp;#93;&lt;/span&gt;$ ls -l /mnt/play01/p60&lt;br/&gt;
ls: cannot access /mnt/play01/p60: No such file or directory&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;bnh65367@p60-storage ~&amp;#93;&lt;/span&gt;$&lt;/p&gt;

&lt;p&gt;Trying to touch one of the missing directories results in this on the MDS and an input output error on the client command line.&lt;/p&gt;

&lt;p&gt;Feb 11 19:13:23 cs04r-sc-mds02-03 kernel: LustreError: 14367:0:(mdt_open.c:1694:mdt_reint_open()) play01-MDT0000: name p60 present, but fid &lt;span class=&quot;error&quot;&gt;&amp;#91;0x45828f:0x7f3b41ef:0x0&amp;#93;&lt;/span&gt; invalid&lt;/p&gt;

&lt;p&gt;I&apos;m currently trying to understand if this is something that is expected? Something we&apos;re likely to see if we upgrade directly from 1.8 to 2.4.2 on our production file systems? And of course we need to fix it. To me it looks like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt; could be related, though if I understand that bug correctly, it should be fixed? Maybe it&apos;ll fix itself (by automatically starting OI scrub?)?&lt;/p&gt;

&lt;p&gt;Is this sufficiently different from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt; and unexpected that I should open a new ticket?&lt;/p&gt;

&lt;p&gt;The file system has been upgrade a few hours ago, lctl get_param &apos;osd-ldiskfs.&amp;#42;.oi_scrub on the MDS reports the status init for both MDT and MGT (see below), does this mean it hasn&apos;t been started and I should start it? How would I start it?&lt;/p&gt;

&lt;p&gt;sudo lctl get_param &apos;osd-ldiskfs.&amp;#42;.oi_scrub&apos;&lt;br/&gt;
osd-ldiskfs.MGS.oi_scrub=&lt;br/&gt;
name: OI_scrub&lt;br/&gt;
magic: 0x4c5fd252&lt;br/&gt;
oi_files: 64&lt;br/&gt;
status: init&lt;br/&gt;
flags:&lt;br/&gt;
param:&lt;br/&gt;
time_since_last_completed: N/A&lt;br/&gt;
time_since_latest_start: N/A&lt;br/&gt;
time_since_last_checkpoint: N/A&lt;br/&gt;
latest_start_position: N/A&lt;br/&gt;
last_checkpoint_position: N/A&lt;br/&gt;
first_failure_position: N/A&lt;br/&gt;
checked: 0&lt;br/&gt;
updated: 0&lt;br/&gt;
failed: 0&lt;br/&gt;
prior_updated: 0&lt;br/&gt;
noscrub: 0&lt;br/&gt;
igif: 0&lt;br/&gt;
success_count: 0&lt;br/&gt;
run_time: 0 seconds&lt;br/&gt;
average_speed: 0 objects/sec&lt;br/&gt;
real-time_speed: N/A&lt;br/&gt;
current_position: N/A&lt;br/&gt;
osd-ldiskfs.play01-MDT0000.oi_scrub=&lt;br/&gt;
name: OI_scrub&lt;br/&gt;
magic: 0x4c5fd252&lt;br/&gt;
oi_files: 64&lt;br/&gt;
status: init&lt;br/&gt;
flags:&lt;br/&gt;
param:&lt;br/&gt;
time_since_last_completed: N/A&lt;br/&gt;
time_since_latest_start: N/A&lt;br/&gt;
time_since_last_checkpoint: N/A&lt;br/&gt;
latest_start_position: N/A&lt;br/&gt;
last_checkpoint_position: N/A&lt;br/&gt;
first_failure_position: N/A&lt;br/&gt;
checked: 0&lt;br/&gt;
updated: 0&lt;br/&gt;
failed: 0&lt;br/&gt;
prior_updated: 0&lt;br/&gt;
noscrub: 0&lt;br/&gt;
igif: 0&lt;br/&gt;
success_count: 0&lt;br/&gt;
run_time: 0 seconds&lt;br/&gt;
average_speed: 0 objects/sec&lt;br/&gt;
real-time_speed: N/A&lt;br/&gt;
current_position: N/A&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;bnh65367@cs04r-sc-mds02-03 ~&amp;#93;&lt;/span&gt;$&lt;/p&gt;


&lt;p&gt;Note that since this is a test file system, I&apos;m going to leave it in this state for a bit longer (day or two) in case there is some additional information I should collect. But sometime next week, I will need to start the OI scrub hoping that this will fix it.&lt;/p&gt;</description>
                <environment>Lustre servers and clients RHEL6, clients running Lustre 1.8.9, file system upgraded from at least 1.8 (could be 1.6)</environment>
        <key id="23141">LU-4626</key>
            <summary>directories missing after upgrade from 1.8 to 2.3 then 2.4.1 then 2.4.2</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="10100">Low Priority</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="ferner">Frederik Ferner</reporter>
                        <labels>
                    </labels>
                <created>Thu, 13 Feb 2014 11:58:52 +0000</created>
                <updated>Wed, 13 Oct 2021 03:17:18 +0000</updated>
                            <resolved>Wed, 13 Oct 2021 03:17:18 +0000</resolved>
                                    <version>Lustre 2.4.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="76962" author="pjones" created="Thu, 13 Feb 2014 14:27:59 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;What do you suggest here?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="77003" author="adilger" created="Thu, 13 Feb 2014 18:22:05 +0000"  >&lt;p&gt;Per my comment in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3634&quot; title=&quot;Bad returned code on error for llapi_hsm_copy_start/end()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3634&quot;&gt;&lt;del&gt;LU-3634&lt;/del&gt;&lt;/a&gt; &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-3934?focusedCommentId=66436&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-66436&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/browse/LU-3934?focusedCommentId=66436&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-66436&lt;/a&gt; I think it makes sense that IGIF (1.8 created) files be looked up in both the OI and via the inode/generation directl if it is not found in the OI while an upgrade scrub is ongoing. Otherwise, the igif_in_oi flag being set at the &lt;em&gt;start&lt;/em&gt; of the scrub means that any IGIF lookups could fail if the IGIF is not yet added to the OI.&lt;/p&gt;

&lt;p&gt;I don&apos;t think the problem will resolve itself without a scrub, but I&apos;ll wait until Lai and Fan Yong have a chance to debug the current situation. &lt;/p&gt;</comment>
                            <comment id="77222" author="ferner" created="Tue, 18 Feb 2014 10:34:32 +0000"  >&lt;p&gt;Are there any updates? Do you need any further debugging from our side? I need to bring the file system back to a fully working state soon, so will start OI scrub tomorrow morning if I haven&apos;t heard anything before then.&lt;/p&gt;</comment>
                            <comment id="77239" author="yong.fan" created="Tue, 18 Feb 2014 14:13:28 +0000"  >&lt;p&gt;I think it is another failure instance of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt;. I suggest to apply the following patches on your Lustre-2.4 in order:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/7625/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7625/&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/6515/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/6515/&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/9140/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9140/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to trigger OI scrub manually, you can run &quot;lctl lfsck_start -M play01-MDT0000&quot;.&lt;/p&gt;</comment>
                            <comment id="77271" author="ferner" created="Tue, 18 Feb 2014 17:46:59 +0000"  >&lt;p&gt;Looking at the git log for b2_4, it seems the first two are already included in 2.4.2, is this correct? (And we are running 2.4.2 on the MDS)&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;bnh65367@cs04r-sc-mds02-03 ~&amp;#93;&lt;/span&gt;$ cat /proc/fs/lustre/version &lt;br/&gt;
lustre: 2.4.2&lt;br/&gt;
kernel: patchless_client&lt;br/&gt;
build:  2.4.2-RC2--PRISTINE-2.6.32-358.23.2.el6_lustre.x86_64&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;bnh65367@cs04r-sc-mds02-03 ~&amp;#93;&lt;/span&gt;$ &lt;/p&gt;

&lt;p&gt;I&apos;ve not compiled a Lustre server kernel for a while now and seem to remember last time there were slight differences in the arguments passed to ./configure when I ran it compared to what we had pre-compiled. Is this an issue? Would a plain &apos;git checkout + apply patches; ./autogen.sh ; ./configure ; make rpm&apos; generate a useful kernel compiled with the same as the automatic builds? Or would I be better off just taking the jenkins build rpms for the last patch in your list? &lt;/p&gt;
</comment>
                            <comment id="77459" author="laisiyao" created="Thu, 20 Feb 2014 08:04:16 +0000"  >&lt;p&gt;You can use the jenkins build for the last patch directly.&lt;/p&gt;</comment>
                            <comment id="77504" author="ferner" created="Thu, 20 Feb 2014 18:43:00 +0000"  >&lt;p&gt;Thanks for the update, though I&apos;ve now compiled this manually after applying the last patch...&lt;/p&gt;

&lt;p&gt;I&apos;ve been running with the updated version on (one of the) MDS for this file system since last night. &lt;/p&gt;

&lt;p&gt;I&apos;m not entirely sure what the expectation was, but currently the clients I tested can access the directories that previously were not accessible:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@p60-storage ~]$ ls -l /mnt/play01/p60
total 32
drwxrwxr-x+ 2 root       dls_dasc 4096 Jun 19  2008 bin
drwxrwxr-x+ 7 root       root     4096 Jan  4  2011 data
drwxrwsr-x  2 epics_user root     4096 Jun 19  2008 epics
drwxrwxr-x+ 2 root       dls_dasc 4096 Aug  1  2008 etc
drwxrwxrwx+ 3 saslauth   saslauth 4096 Jul 28  2008 logs
drwxrwxrwx+ 2 saslauth   saslauth 4096 Jun 19  2008 scripts
drwxrwxr-x+ 6 saslauth   dls_dasc 4096 Oct 14  2009 software
drwxrwxr-x+ 2 saslauth   saslauth 4096 Jun 19  2008 var
[bnh65367@p60-storage ~]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As far as I understand the output below, scrub hasn&apos;t run yet, can you confirm? (different MDS as I upgraded Lustre on the second one and then did a fail-over to the upgraded MDS).&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[bnh65367@cs04r-sc-mds02-04 ~]$ sudo lctl get_param &apos;osd-ldiskfs.\*.oi_scrub&apos;
osd-ldiskfs.MGS.oi_scrub=
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: init
flags:
param:
time_since_last_completed: N/A
time_since_latest_start: N/A
time_since_last_checkpoint: N/A
latest_start_position: N/A
last_checkpoint_position: N/A
first_failure_position: N/A
checked: 0
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 0
success_count: 0
run_time: 0 seconds
average_speed: 0 objects/sec
real-time_speed: N/A
current_position: N/A
osd-ldiskfs.play01-MDT0000.oi_scrub=
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: init
flags:
param:
time_since_last_completed: N/A
time_since_latest_start: N/A
time_since_last_checkpoint: N/A
latest_start_position: N/A
last_checkpoint_position: N/A
first_failure_position: N/A
checked: 0
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 0
success_count: 0
run_time: 0 seconds
average_speed: 0 objects/sec
real-time_speed: N/A
current_position: N/A
[bnh65367@cs04r-sc-mds02-04 ~]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="77817" author="laisiyao" created="Tue, 25 Feb 2014 15:48:44 +0000"  >&lt;p&gt;The result looks normal to me. And OI scrub should have been done in upgrade from 2.3 to 2.4.1, could you do it again, and dump oi_scrub from 2.4.1?&lt;/p&gt;</comment>
                            <comment id="78896" author="adilger" created="Mon, 10 Mar 2014 17:29:05 +0000"  >&lt;p&gt;Lai,&lt;br/&gt;
 Could you please look at my &amp;#8211; &lt;a href=&quot;#comment-66436&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;comment-66436&lt;/a&gt; in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt;and make a patch for that. I don&apos;t think it makes sense to block direct IGIF lookups when OI Scrub is started, but rather only block direct IGIF lookups when scrub is finished. Before then, it should try the OI lookup first and then fall back to the direct IGIF FID if that fails.&lt;/p&gt;</comment>
                            <comment id="91999" author="laisiyao" created="Wed, 20 Aug 2014 03:17:28 +0000"  >&lt;p&gt;I&apos;ll make a patch for the issue Andreas mentioned.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="20917">LU-3934</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 20 Aug 2014 11:58:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwf1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12656</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 13 Feb 2014 11:58:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>