<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:49:27 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12073] conf-sanity test 123aa hangs</title>
                <link>https://jira.whamcloud.com/browse/LU-12073</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;conf-sanity test_123aa hangs. conf-sanity test 123aa as added to b2_10 on 23 Feb 2019 with patch &lt;a href=&quot;https://review.whamcloud.com/33863&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33863&lt;/a&gt;. Since that time, we&#8217;ve seen about four test sessions timeout during this test and we only see this issue on the b2_10 branch.&lt;/p&gt;

&lt;p&gt;Looking at the suite_log for the hang at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/4159964c-4363-11e9-9646-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/4159964c-4363-11e9-9646-52540065bddc&lt;/a&gt;, (RHEL6.10 client testing)the last thing we see looks like the file system is coming up and setting up quotas&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Total disk size: 451176  block-softlimit: 452200 block-hardlimit: 474810 inode-softlimit: 79992 inode-hardlimit: 83991
Setting up quota on trevis-33vm1.trevis.whamcloud.com:/mnt/lustre for quota_usr...
+ /usr/bin/lfs setquota -u quota_usr -b 452200 -B 474810 -i 79992 -I 83991 /mnt/lustre
+ /usr/bin/lfs setquota -g quota_usr -b 452200 -B 474810 -i 79992 -I 83991 /mnt/lustre
Quota settings for quota_usr : 
Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre     [0]  452200  474810       -       0   79992   83991       -
lustre-MDT0000_UUID
                      0       -       0       -       0       -       0       -
lustre-OST0000_UUID
                      0       -       0       -       -       -       -       -
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at console logs, some of the nodes are complaining that the MGS can&#8217;t be found. Looking at the client (vm1) console log, the last errors we see before the call traces are&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: mount | grep /mnt/lustre&apos; &apos;
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
Lustre: DEBUG MARKER: Using TIMEOUT=20
Lustre: DEBUG MARKER: lctl dl | grep &apos; IN osc &apos; 2&amp;gt;/dev/null | wc -l
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
LustreError: 4491:0:(lov_obd.c:1379:lov_quotactl()) ost 1 is inactive
LustreError: 11-0: lustre-OST0001-osc-ffff88004ddac000: operation ost_connect to node 10.9.5.160@tcp failed: rc = -19
LustreError: 11-0: lustre-OST0001-osc-ffff88004ddac000: operation ost_connect to node 10.9.5.160@tcp failed: rc = -19
LustreError: Skipped 3 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at the OSS (vm3) console logs, we see the same errors for each OST when formatting the OSTs after the first OST&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[48116.032988] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost2; mount -t lustre   		                   /dev/lvm-Role_OSS/P2 /mnt/lustre-ost2
[48116.396020] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5
[48116.398214] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[48116.525199] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5
[48116.527059] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache,nodelalloc
[48116.657800] LustreError: 15f-b: lustre-OST0001: cannot register this server with the MGS: rc = -17. Is the MGS running?
[48116.659704] LustreError: 19668:0:(obd_mount_server.c:1882:server_fill_super()) Unable to start targets: -17
[48116.661395] LustreError: 19668:0:(obd_mount_server.c:1592:server_put_super()) no obd lustre-OST0001
[48116.663024] LustreError: 19668:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-OST0001 not registered
[48116.738331] LustreError: 19668:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-17)
[48117.926245] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.5.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[48117.929186] LustreError: Skipped 2 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and then&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[48150.341541] Lustre: DEBUG MARKER: trevis-33vm2.trevis.whamcloud.com: executing set_default_debug -1 all 4
[48152.124460] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
[48152.207544] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.5.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[48152.210571] LustreError: Skipped 26 previous similar messages
[48152.298637] Lustre: DEBUG MARKER: Using TIMEOUT=20
[48157.475237] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
[48217.205900] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.5.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[48217.208963] LustreError: Skipped 155 previous similar messages
[48347.205833] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.5.161@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[48347.208773] LustreError: Skipped 311 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the MGS/MDS console log, we the same errors for each OST after the first one&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[48111.202961] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2&amp;gt;/dev/null
[48111.503103] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[48115.037113] Lustre: DEBUG MARKER: /sbin/lctl mark trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4
[48115.218187] Lustre: DEBUG MARKER: trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4
[48119.050786] LustreError: 30672:0:(llog.c:391:llog_init_handle()) MGS: llog uuid mismatch: config_uuid/
[48119.052457] LustreError: 30672:0:(mgs_llog.c:1864:record_start_log()) MGS: can&apos;t start log lustre-MDT0000.1552165754.bak: rc = -17
[48119.054392] LustreError: 30672:0:(mgs_llog.c:1961:mgs_write_log_direct_all()) MGS: writing log lustre-MDT0000.1552165754.bak: rc = -17
[48119.056367] LustreError: 30672:0:(mgs_llog.c:4234:mgs_write_log_param()) err -17 on param &apos;sys.timeout=20&apos;
[48119.058095] LustreError: 30672:0:(mgs_handler.c:535:mgs_target_reg()) Failed to write lustre-OST0001 log (-17)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;SLES clients - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/d89910e2-3890-11e9-8f69-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/d89910e2-3890-11e9-8f69-52540065bddc&lt;/a&gt;&lt;br/&gt;
Ubuntu clients - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/0abcfbbe-4300-11e9-9646-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/0abcfbbe-4300-11e9-9646-52540065bddc&lt;/a&gt;&lt;br/&gt;
RHEL 6.10 clients - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/2a566de2-4324-11e9-92fe-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/2a566de2-4324-11e9-92fe-52540065bddc&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="55165">LU-12073</key>
            <summary>conf-sanity test 123aa hangs</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Thu, 14 Mar 2019 18:17:18 +0000</created>
                <updated>Thu, 21 Mar 2019 15:29:25 +0000</updated>
                            <resolved>Thu, 21 Mar 2019 15:29:25 +0000</resolved>
                                    <version>Lustre 2.10.7</version>
                                    <fixVersion>Lustre 2.10.7</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="243958" author="adilger" created="Thu, 14 Mar 2019 22:43:00 +0000"  >&lt;p&gt;It looks like this is some kind of problem with the current state of the system when &lt;tt&gt;setupall&lt;/tt&gt; is run, since there are complaints during OST startup before the &quot;test&quot; is actually run:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;trevis-33vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P2 at /mnt/lustre-ost2 failed: File exists
trevis-33vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P3 at /mnt/lustre-ost3 failed: File exists
trevis-33vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P4 at /mnt/lustre-ost4 failed: File exists
trevis-33vm3: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P5 at /mnt/lustre-ost5 failed: File exists
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It looks like &lt;tt&gt;test_120&lt;/tt&gt; and &lt;tt&gt;test_122&lt;/tt&gt; are skipped for non-DNE configs (which only happens for interop &quot;full&quot; sessions, not same-version &quot;full&quot; sessions), so the last test run before &lt;tt&gt;test_123aa&lt;/tt&gt; on &lt;tt&gt;&lt;b&gt;b2_10&lt;/b&gt;&lt;/tt&gt; is &lt;tt&gt;test_109b&lt;/tt&gt; (it doesn&apos;t have &lt;tt&gt;test_110&lt;/tt&gt;, &lt;tt&gt;test_111&lt;/tt&gt;, and &lt;tt&gt;test_115&lt;/tt&gt; which also reformat the filesystem), and &lt;tt&gt;test_109&lt;/tt&gt; looks like it is only starting/stopping &lt;tt&gt;ost1&lt;/tt&gt; and &lt;tt&gt;mds1&lt;/tt&gt; facets, but &lt;tt&gt;startall&lt;/tt&gt; is trying to start all of the facets.  So while it would &lt;em&gt;appear&lt;/em&gt; that this is an interop issue, I think it is more of a test configuration issue.&lt;/p&gt;

&lt;p&gt;It seems in the tests that there is an unholy mixture of &quot;&lt;tt&gt;setup&lt;/tt&gt;&quot; vs. &quot;&lt;tt&gt;setupall&lt;/tt&gt;&quot; and it isn&apos;t always clear which one to use.  On the one hand it would be faster to avoid a reformat for every subtest, but on the other hand if there are a variety of different subtests being run it might just make sense to have every conf-sanity test start with a reformat and end with a cleanup rather than trying to re-use an unknown filesystem configuration in the next subtest.&lt;/p&gt;

&lt;p&gt;For now I think the solution is a b2_10-only patch that changes &lt;tt&gt;setupall&lt;/tt&gt; to &lt;tt&gt;setup&lt;/tt&gt;, but for master it may be that we want a more complete reorganization of how conf-sanity tests are run.  It might make sense to split &lt;tt&gt;conf-sanity.sh&lt;/tt&gt; into &lt;tt&gt;conf-sanity-reformat&lt;/tt&gt; and &lt;tt&gt;conf-sanity-keep&lt;/tt&gt; or something like that, since it is already one of the longest test sessions, but that is just a first guess and there may be better ways to do that.&lt;/p&gt;</comment>
                            <comment id="243959" author="gerrit" created="Thu, 14 Mar 2019 22:45:21 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34428&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34428&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12073&quot; title=&quot;conf-sanity test 123aa hangs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12073&quot;&gt;&lt;del&gt;LU-12073&lt;/del&gt;&lt;/a&gt; tests: fix conf-sanity test_123 startup&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 424c36bd6120aa4c96ebcd20d25c4f95fd41cffa&lt;/p&gt;</comment>
                            <comment id="244428" author="gerrit" created="Thu, 21 Mar 2019 15:02:53 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/34428/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34428/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12073&quot; title=&quot;conf-sanity test 123aa hangs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12073&quot;&gt;&lt;del&gt;LU-12073&lt;/del&gt;&lt;/a&gt; tests: fix conf-sanity test_123 startup&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: d88eeb1b120e2c560e344f5a6e22a58b5221ac05&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00ddb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>