<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:29:32 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2936] nrs_svcpt2nrs()) ASSERTION( (!(hp) || (nrs_svcpt_has_hp(svcpt))) ) failed</title>
                <link>https://jira.whamcloud.com/browse/LU-2936</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;ORNL did a full scale system test today and one of their OSSes crashed with the above assertion.&lt;/p&gt;

&lt;p&gt;It seems we never saw it before because we don&apos;t seriously test /proc/fs/lustre/health_check functionality in our testing, but it&apos;s actually heavily used by a lot of sites.&lt;/p&gt;

&lt;p&gt;I was able to reproduce the issue with racer while running this line in parallel:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; :; &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; cat /proc/fs/lustre/health_check ; done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[305098.783912] LustreError: 1075:0:(ptlrpc_internal.h:165:nrs_svcpt2nrs()) ASSERTION( (!(hp) || (nrs_svcpt_has_hp(svcpt))) ) failed: 
[305098.784415] LustreError: 1075:0:(ptlrpc_internal.h:165:nrs_svcpt2nrs()) LBUG
[305098.784682] Pid: 1075, comm: cat
[305098.784881] 
[305098.784881] Call Trace:
[305098.785248]  [&amp;lt;ffffffffa07b7915&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[305098.785524]  [&amp;lt;ffffffffa07b7f17&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
[305098.785810]  [&amp;lt;ffffffffa11b0e22&amp;gt;] ptlrpc_nrs_req_poll_nolock+0xc2/0x1c0 [ptlrpc]
[305098.786239]  [&amp;lt;ffffffffa1170696&amp;gt;] ptlrpc_svcpt_health_check+0x56/0x180 [ptlrpc]
[305098.786664]  [&amp;lt;ffffffffa1170812&amp;gt;] ptlrpc_service_health_check+0x52/0x70 [ptlrpc]
[305098.787079]  [&amp;lt;ffffffffa05d61fd&amp;gt;] ost_health_check+0x4d/0x90 [ost]
[305098.787345]  [&amp;lt;ffffffffa0e4c8e7&amp;gt;] obd_proc_read_health+0x2a7/0x3b0 [obdclass]
[305098.792327]  [&amp;lt;ffffffffa0e6f36c&amp;gt;] lprocfs_fops_read+0xec/0x1f0 [obdclass]
[305098.793699]  [&amp;lt;ffffffffa0e6f280&amp;gt;] ? lprocfs_fops_read+0x0/0x1f0 [obdclass]
[305098.793964]  [&amp;lt;ffffffff811e1cc5&amp;gt;] proc_reg_read+0x85/0xc0
[305098.794200]  [&amp;lt;ffffffff8117b9e5&amp;gt;] vfs_read+0xb5/0x1a0
[305098.794429]  [&amp;lt;ffffffff8117bb21&amp;gt;] sys_read+0x51/0x90
[305098.794657]  [&amp;lt;ffffffff8100b0f2&amp;gt;] system_call_fastpath+0x16/0x1b
[305098.794907] 
[305098.992803] Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Crashdump and modules are in /exports/crashdumps/192.168.10.210-2013-03-09-00\:44\:57&lt;/p&gt;

&lt;p&gt;The problem was seemingly added along with NRS code drop.&lt;/p&gt;</description>
                <environment></environment>
        <key id="17811">LU-2936</key>
            <summary>nrs_svcpt2nrs()) ASSERTION( (!(hp) || (nrs_svcpt_has_hp(svcpt))) ) failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="green">Oleg Drokin</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                            <label>HB</label>
                    </labels>
                <created>Sat, 9 Mar 2013 00:57:25 +0000</created>
                <updated>Wed, 13 Mar 2013 08:55:41 +0000</updated>
                            <resolved>Wed, 13 Mar 2013 08:24:57 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="53651" author="green" created="Sat, 9 Mar 2013 18:28:40 +0000"  >&lt;p&gt;Ok, I now checked the dump and the situation is clear.&lt;br/&gt;
The ptlrpc_svcpt_health_check tries to call ptlrpc_nrs_req_poll_nolock twice for every service partition that has anything pending.&lt;br/&gt;
Once time with hp set to true and once to false.&lt;br/&gt;
Then if we happen to be called for a service that does not have hp ops registered (like ost_create), the assertion trips as the underlying nrs code seems to assuem caller must be all smart about service types and request for a hp request for a service with no possible hp requests is a no-no (which is a bit strange, considering that it&apos;s perfectly ok to check if a service has any hp requests pendign even for non-hp services.)&lt;/p&gt;

&lt;p&gt;As such possible fixes are:&lt;br/&gt;
1. remove the assertion and restrictions on caller knowledge on underlying service when trying to fetch requests.&lt;br/&gt;
2. Check that a hp request is actually available before trying to fetch it from ptlrpc_svcpt_health_check&lt;/p&gt;</comment>
                            <comment id="53652" author="green" created="Sat, 9 Mar 2013 18:47:27 +0000"  >&lt;p&gt;patch in &lt;a href=&quot;http://review.whamcloud.com/5665&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5665&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="53659" author="nangelinas" created="Sun, 10 Mar 2013 08:01:56 +0000"  >&lt;p&gt;As mentioned in Gerrit, this bug is addressed in the NRS follow-up patch as well, but &apos;2&apos; from the comment above that you have used is a better solution. Maybe &apos;1&apos; could be used to improve things on a future patch.&lt;/p&gt;</comment>
                            <comment id="53897" author="jlevi" created="Wed, 13 Mar 2013 08:24:57 +0000"  >&lt;p&gt;Patch landed to master.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="11119">LU-398</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvkiv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7054</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>