<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:59:27 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6351] LFSCK MDS crash: unable to handle kernel NULL pointer dereference </title>
                <link>https://jira.whamcloud.com/browse/LU-6351</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While running the stability test from the LFSCK Phase 3 test plan, the primary MDS, containing MDT0 and MDT2, crashed. The stability test crates 150 directories with 10,000 object each; files, striped directories, links, etc. . Then one process calls LFSCK namespace over and over while another process deletes the 150 directories and objects and all other processes create directories with files, striped directories, etc. &lt;/p&gt;

&lt;p&gt;The first LFSCK namespace on all four MDTs runs and completes. The second time LFSCK is called, the primary MDS crashes. When the MDT comes back, I see that the status of LFSCK is &#8220;crashed&#8221; :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;do_facet mds1 lctl get_param -n mdd.scratch-MDT0000.lfsck_namespace
status = crashed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When the MDS comes back, I see the following errors in dmesg:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: scratch-MDT0002-osp-MDT0000: Connection restored to scratch-MDT0002 (at 0@lo)
Lustre: scratch-MDT0002: Recovery over after 0:05, of 9 clients 9 recovered and 0 were evicted.
LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166586, ql: 1, comp: 8, conn: 9, next: 4328166588, last_committed: 4328166570)
LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166599, ql: 1, comp: 8, conn: 9, next: 4328166601, last_committed: 4328166570)
LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166607, ql: 1, comp: 8, conn: 9, next: 4328166609, last_committed: 4328166570)
LustreError: 2634:0:(ldlm_lib.c:1748:check_for_next_transno()) scratch-MDT0000: waking for gap in transno, VBR is OFF (skip: 4328166613, ql: 1, comp: 8, conn: 9, next: 4328166615, last_committed: 4328166570)
Lustre: scratch-MDT0000-osp-MDT0002: Connection restored to scratch-MDT0000 (at 0@lo)
Lustre: scratch-MDT0000: Recovery over after 0:35, of 9 clients 9 recovered and 0 were evicted.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the vmcore-dmesg:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;lt;1&amp;gt;BUG: unable to handle kernel NULL pointer dereference at 00000000000000ae
&amp;lt;1&amp;gt;IP: [&amp;lt;ffffffffa0dee512&amp;gt;] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
&amp;lt;4&amp;gt;PGD 0 
&amp;lt;4&amp;gt;Oops: 0000 [#1] SMP 
&amp;lt;4&amp;gt;last sysfs file: /sys/devices/system/cpu/online
&amp;lt;4&amp;gt;CPU 9 
&amp;lt;4&amp;gt;Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldi
skfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrp
c(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic crc3
2c_intel jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_fil
ter ip_tables nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq
_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_um
ad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode iTCO_wdt iTCO_vendor_support serio
_raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core io
atdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp p
ps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic at
a_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_
mod [last unloaded: libcfs]
&amp;lt;4&amp;gt;
&amp;lt;4&amp;gt;Pid: 8092, comm: lfsck_namespace Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
&amp;lt;4&amp;gt;RIP: 0010:[&amp;lt;ffffffffa0dee512&amp;gt;]  [&amp;lt;ffffffffa0dee512&amp;gt;] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
&amp;lt;4&amp;gt;RSP: 0018:ffff880b446cdaa0  EFLAGS: 00010246
&amp;lt;4&amp;gt;RAX: 0000000000000000 RBX: ffff8807e2a16900 RCX: ffff880a0bedeb14
&amp;lt;4&amp;gt;RDX: ffff8801f990d070 RSI: ffff8807e2a16900 RDI: ffff880191c11e40
&amp;lt;4&amp;gt;RBP: ffff880b446cdb30 R08: fffffffffffffffe R09: ffffffffa0dee430
&amp;lt;4&amp;gt;R10: 0000000000000000 R11: 0000000000000002 R12: ffff880191c11e40
&amp;lt;4&amp;gt;R13: ffff880191c11e40 R14: ffff880a0bedeb14 R15: ffff880a0bedeb14
&amp;lt;4&amp;gt;FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
&amp;lt;4&amp;gt;CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
&amp;lt;4&amp;gt;CR2: 00000000000000ae CR3: 0000000001a85000 CR4: 00000000000007e0
&amp;lt;4&amp;gt;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&amp;lt;4&amp;gt;DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&amp;lt;4&amp;gt;Process lfsck_namespace (pid: 8092, threadinfo ffff880b446cc000, task ffff880c28df0040)
&amp;lt;4&amp;gt;Stack:
&amp;lt;4&amp;gt; ffff8801ac526000 ffff880a0bedeb14 000000000000001a ffff8801f990d350
&amp;lt;4&amp;gt;&amp;lt;d&amp;gt; ffff880a0bedeae8 0000000000004000 ffff880b446cdb30 ffffffff8128daa4
&amp;lt;4&amp;gt;&amp;lt;d&amp;gt; ffff8801f990d070 ffff880b446cdb40 ffff880b446cdb00 ffffffffa05e22c3
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8128daa4&amp;gt;] ? snprintf+0x34/0x40
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa05e22c3&amp;gt;] ? fld_server_lookup+0x53/0x330 [fld]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0eee082&amp;gt;] lfsck_namespace_check_exist+0xd2/0x410 [lfsck]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0f24fc6&amp;gt;] lfsck_namespace_handle_striped_master+0x1b6/0xb50 [lfsck]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0868931&amp;gt;] ? lu_object_find_at+0xb1/0xe0 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0ef1532&amp;gt;] lfsck_namespace_assistant_handler_p1+0xb52/0x2310 [lfsck]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81170b79&amp;gt;] ? __drain_alien_cache+0x89/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0ee16e6&amp;gt;] lfsck_assistant_engine+0x496/0x1de0 [lfsck]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81061d00&amp;gt;] ? default_wake_function+0x0/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa0ee1250&amp;gt;] ? lfsck_assistant_engine+0x0/0x1de0 [lfsck]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109abf6&amp;gt;] kthread+0x96/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c20a&amp;gt;] child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109ab60&amp;gt;] ? kthread+0x0/0xa0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c200&amp;gt;] ? child_rip+0x0/0x20
&amp;lt;4&amp;gt;Code: 05 04 af 04 00 a3 16 00 00 48 c7 05 05 af 04 00 00 00 00 00 c7 05 f3 ae
04 00 01 00 00 00 e8 76 5c 76 ff 4c 8b 45 90 48 8b 43 40 &amp;lt;0f&amp;gt; b7 80 ae 00 00 00
25 00 f0 00 00 3d 00 40 00 00 0f 85 75 0a 
&amp;lt;1&amp;gt;RIP  [&amp;lt;ffffffffa0dee512&amp;gt;] osd_index_ea_lookup+0xe2/0xdc0 [osd_ldiskfs]
&amp;lt;4&amp;gt; RSP &amp;lt;ffff880b446cdaa0&amp;gt;
&amp;lt;4&amp;gt;CR2: 00000000000000ae
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I will upload the vmcore. &lt;/p&gt;</description>
                <environment>OpenSFS cluster running lustre 2.7.0-RC-4 build # 29 with two MDSs with two MDTs each, three OSSs with two OSTs each and three clients.</environment>
        <key id="29022">LU-6351</key>
            <summary>LFSCK MDS crash: unable to handle kernel NULL pointer dereference </summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="29081">LU-6361</parent>
                                    <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                            <label>lfsck</label>
                    </labels>
                <created>Sat, 7 Mar 2015 17:59:06 +0000</created>
                <updated>Thu, 12 May 2016 16:59:19 +0000</updated>
                            <resolved>Wed, 29 Apr 2015 07:51:26 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="109162" author="jamesanunez" created="Sat, 7 Mar 2015 19:22:06 +0000"  >&lt;p&gt;vmcore and vmcore-dmesg.txt are at uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6351&quot; title=&quot;LFSCK MDS crash: unable to handle kernel NULL pointer dereference &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6351&quot;&gt;&lt;del&gt;LU-6351&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="109163" author="jamesanunez" created="Sat, 7 Mar 2015 19:27:51 +0000"  >&lt;p&gt;On one of the OSTs:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: MGC192.168.2.125@o2ib: Connection restored to MGS (at 192.168.2.125@o2ib)
Lustre: Skipped 88 previous similar messages
LustreError: 167-0: scratch-MDT0002-lwp-OST0005: This client was evicted by scratch-MDT0002; in progress operations using this service will fail.
LustreError: Skipped 69 previous similar messages
Lustre: scratch-OST0005: deleting orphan objects from 0x0:686571 to 0x0:686625
Lustre: scratch-OST0004: deleting orphan objects from 0x0:686568 to 0x0:686625
LustreError: 3986:0:(ofd_grant.c:183:ofd_grant_sanity_check()) ofd_statfs: tot_granted 262912 != fo_tot_granted 85590784
LustreError: 3986:0:(ofd_grant.c:189:ofd_grant_sanity_check()) ofd_statfs: tot_dirty 0 != fo_tot_dirty 1048576
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="109175" author="yong.fan" created="Sun, 8 Mar 2015 14:31:27 +0000"  >&lt;p&gt;Under some cases, when the LFSCK locate the object via its FID, it does not check whether it exists or not, then further using such object may access NULL-pointed local object (inode for ldiskfs).&lt;/p&gt;

&lt;p&gt;Part of the issue has been fixed in the patch: &lt;a href=&quot;http://review.whamcloud.com/#/c/13993/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/13993/&lt;/a&gt;. But it is not enough. I will make another path for the left issues.&lt;/p&gt;</comment>
                            <comment id="109176" author="gerrit" created="Sun, 8 Mar 2015 14:37:16 +0000"  >&lt;p&gt;Fan Yong (fan.yong@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14009&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14009&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6351&quot; title=&quot;LFSCK MDS crash: unable to handle kernel NULL pointer dereference &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6351&quot;&gt;&lt;del&gt;LU-6351&lt;/del&gt;&lt;/a&gt; lfsck: check object existence before using it&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 776db8e76865f86e3be511375c55479b0817b559&lt;/p&gt;</comment>
                            <comment id="113584" author="gerrit" created="Tue, 28 Apr 2015 05:12:31 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/14009/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14009/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6351&quot; title=&quot;LFSCK MDS crash: unable to handle kernel NULL pointer dereference &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6351&quot;&gt;&lt;del&gt;LU-6351&lt;/del&gt;&lt;/a&gt; lfsck: check object existence before using it&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f3ea0cea6bb6766eaa55571774b9ae942a6bf297&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzx7xz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>17773</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>