<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:28:12 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2788] Sanity test_132 (lu_object.c:1982:lu_ucred_assert()) ASSERTION( uc != ((void *)0) ) failed:</title>
                <link>https://jira.whamcloud.com/browse/LU-2788</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While &lt;a href=&quot;http://review.whamcloud.com/5222&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5222&lt;/a&gt; was being testing and assertion was trigged.  So far this this error has not been seen outside of this patch. &lt;/p&gt;

&lt;p&gt;The test is sanity 132&lt;/p&gt;

&lt;p&gt;As seen in the logs for the MDS in all occasions. &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;14:06:06:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
14:06:06:Lustre: DEBUG MARKER: test -b /dev/lvm-MDS/P1
14:06:06:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
14:06:06:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
14:06:06:LustreError: 17859:0:(fld_index.c:201:fld_index_create()) srv-lustre-MDT0000: insert range [0x000000000000000c-0x0000000100000000):0:mdt failed: rc = -17
14:06:06:Lustre: lustre-MDT0000: used disk, loading
14:06:06:Lustre: 17859:0:(mdt_lproc.c:383:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity
14:06:06:Lustre: Increasing default stripe size to min 1048576
14:06:06:Lustre: Enabling SOM
14:06:06:LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
14:06:06:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
14:06:06:LNet: 18036:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
14:06:06:LNet: 18036:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 1 previous similar message
14:06:06:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2&amp;gt;/dev/null
14:06:06:Lustre: 3644:0:(client.c:1845:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1360015559/real 1360015559]  req@ffff88002f949800 x1426077333348683/t0(0) o8-&amp;gt;lustre-OST0006-osc-MDT0000@10.10.4.195@tcp:28/4 lens 400/544 e 0 to 1 dl 1360015564 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
14:06:17:LustreError: 11-0: lustre-OST0004-osc-MDT0000: Communicating with 10.10.4.195@tcp, operation ost_connect failed with -19.
14:06:18:Lustre: DEBUG MARKER: lctl get_param -n timeout
14:06:19:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
14:06:19:Lustre: DEBUG MARKER: Using TIMEOUT=20
14:06:19:Lustre: DEBUG MARKER: lctl dl | grep &apos; IN osc &apos; 2&amp;gt;/dev/null | wc -l
14:06:19:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.sys.jobid_var=procname_uid
14:07:12:Lustre: MGS: haven&apos;t heard from client 2b7f8516-fc0a-afb9-790c-1965aaaa46c2 (at 10.10.4.197@tcp) in 50 seconds. I think it&apos;s dead, and I am evicting it. exp ffff880078f2e800, cur 1360015629 expire 1360015599 last 1360015579
14:07:23:Lustre: lustre-MDT0000: haven&apos;t heard from client 5fcc94dc-d9c0-7c5c-7665-6b8afe791bb0 (at 10.10.4.197@tcp) in 50 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88007832ec00, cur 1360015634 expire 1360015604 last 1360015584
14:07:23:LustreError: 17820:0:(lu_object.c:1982:lu_ucred_assert()) ASSERTION( uc != ((void *)0) ) failed: 
14:07:23:LustreError: 17820:0:(lu_object.c:1982:lu_ucred_assert()) LBUG
14:07:23:Pid: 17820, comm: ll_evictor
14:07:23:
14:07:23:Call Trace:
14:07:23: [&amp;lt;ffffffffa04d7895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
14:07:23: [&amp;lt;ffffffffa04d7e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
14:07:23: [&amp;lt;ffffffffa0664755&amp;gt;] lu_ucred_assert+0x45/0x50 [obdclass]
14:07:23: [&amp;lt;ffffffffa0c52c66&amp;gt;] mdd_xattr_sanity_check+0x36/0x1f0 [mdd]
14:07:23: [&amp;lt;ffffffffa0c58221&amp;gt;] mdd_xattr_del+0xf1/0x540 [mdd]
14:07:23: [&amp;lt;ffffffffa0e3fe0a&amp;gt;] mdt_som_attr_set+0xfa/0x390 [mdt]
14:07:23: [&amp;lt;ffffffffa0e401ec&amp;gt;] mdt_ioepoch_close_on_eviction+0x14c/0x170 [mdt]
14:07:23: [&amp;lt;ffffffffa0f100c9&amp;gt;] ? osp_key_init+0x59/0x1a0 [osp]
14:07:23: [&amp;lt;ffffffffa0e40c4b&amp;gt;] mdt_ioepoch_close+0x2ab/0x3b0 [mdt]
14:07:23: [&amp;lt;ffffffffa0e411fe&amp;gt;] mdt_mfd_close+0x4ae/0x6e0 [mdt]
14:07:23: [&amp;lt;ffffffffa0e1297e&amp;gt;] mdt_obd_disconnect+0x3ae/0x4d0 [mdt]
14:07:23: [&amp;lt;ffffffffa061cd78&amp;gt;] class_fail_export+0x248/0x580 [obdclass]
14:07:23: [&amp;lt;ffffffffa07f9079&amp;gt;] ping_evictor_main+0x249/0x640 [ptlrpc]
14:07:23: [&amp;lt;ffffffff8105fa40&amp;gt;] ? default_wake_function+0x0/0x20
14:07:23: [&amp;lt;ffffffffa07f8e30&amp;gt;] ? ping_evictor_main+0x0/0x640 [ptlrpc]
14:07:23: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
14:07:23: [&amp;lt;ffffffffa07f8e30&amp;gt;] ? ping_evictor_main+0x0/0x640 [ptlrpc]
14:07:23: [&amp;lt;ffffffffa07f8e30&amp;gt;] ? ping_evictor_main+0x0/0x640 [ptlrpc]
14:07:23: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The root cause looks to be the client being locked up but even if the client was in a death spin it should not bring the mds down. &lt;/p&gt;</description>
                <environment>A patched pushed via git. </environment>
        <key id="17519">LU-2788</key>
            <summary>Sanity test_132 (lu_object.c:1982:lu_ucred_assert()) ASSERTION( uc != ((void *)0) ) failed:</summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="17379">LU-2720</parent>
                                    <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="keith">Keith Mannthey</assignee>
                                    <reporter username="keith">Keith Mannthey</reporter>
                        <labels>
                    </labels>
                <created>Fri, 8 Feb 2013 15:33:00 +0000</created>
                <updated>Fri, 22 Feb 2013 14:22:49 +0000</updated>
                            <resolved>Fri, 22 Feb 2013 14:22:49 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="52075" author="keith" created="Fri, 8 Feb 2013 17:20:44 +0000"  >&lt;p&gt;The client was crashing with this. And I believe this was part of the code changed in the patch (feel free to correct me if this is wrong). &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;14:06:24:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n jobid_var
14:06:24:Lustre: DEBUG MARKER: cancel_lru_locks osc start
14:06:24:LustreError: 7850:0:(cl_lock.c:1111:cl_use_try()) ASSERTION( lock-&amp;gt;cll_state == CLS_CACHED ) failed: 
14:06:24:LustreError: 7850:0:(cl_lock.c:1111:cl_use_try()) LBUG
14:06:24:Pid: 7850, comm: ll_close
14:06:24:
14:06:24:Call Trace:
14:06:24: [&amp;lt;ffffffffa042f895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
14:06:24: [&amp;lt;ffffffffa042fe97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
14:06:24: [&amp;lt;ffffffffa0605c34&amp;gt;] cl_use_try+0x274/0x2e0 [obdclass]
14:06:24: [&amp;lt;ffffffffa0607580&amp;gt;] cl_lock_peek+0x110/0x180 [obdclass]
14:06:24: [&amp;lt;ffffffffa0b4d806&amp;gt;] cl_local_size+0x266/0x2e0 [lustre]
14:06:24: [&amp;lt;ffffffffa0b12a06&amp;gt;] ll_done_writing_attr+0x46/0x190 [lustre]
14:06:24: [&amp;lt;ffffffffa0b12e49&amp;gt;] ll_ioepoch_close+0x2f9/0x5b0 [lustre]
14:06:24: [&amp;lt;ffffffffa0b14710&amp;gt;] ll_close_thread+0x2a0/0x1030 [lustre]
14:06:24: [&amp;lt;ffffffff8105fa40&amp;gt;] ? default_wake_function+0x0/0x20
14:06:24: [&amp;lt;ffffffffa0b14470&amp;gt;] ? ll_close_thread+0x0/0x1030 [lustre]
14:06:24: [&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
14:06:24: [&amp;lt;ffffffffa0b14470&amp;gt;] ? ll_close_thread+0x0/0x1030 [lustre]
14:06:24: [&amp;lt;ffffffffa0b14470&amp;gt;] ? ll_close_thread+0x0/0x1030 [lustre]
14:06:24: [&amp;lt;ffffffff8100c0c0&amp;gt;] ? child_rip+0x0/0x20
14:06:24:
14:06:24:Kernel panic - not syncing: LBUG
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So basically a client crashed then one timeout later the MDS trips over an assert and reboots. &lt;/p&gt;</comment>
                            <comment id="52894" author="keith" created="Fri, 22 Feb 2013 14:21:56 +0000"  >&lt;p&gt;A do while loop was removed from the cl_lock_peek code patch in the following version of the patch. This error should not be hit again and was related to the earlier patch set.  &lt;/p&gt;</comment>
                            <comment id="52895" author="keith" created="Fri, 22 Feb 2013 14:22:49 +0000"  >&lt;p&gt;Newer code versions of the patch to not have issue. &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzviwv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6754</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>