<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:08:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7430] General protection fault: 0000 upon mounting MDT</title>
                <link>https://jira.whamcloud.com/browse/LU-7430</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The error occurred during soak testing of build &apos;20151113&apos; (see &lt;a href=&quot;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;spaceKey=Releases#SoakTestingonLola-20151113&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&amp;amp;spaceKey=Releases#SoakTestingonLola-20151113&lt;/a&gt;). DNE is enabled. OSTs have been formated with &lt;em&gt;zfs&lt;/em&gt;, MDTs with &lt;em&gt;ldiskfs&lt;/em&gt; as backend. MDSes are configured in active-active HA failover configuration.&lt;/p&gt;

&lt;p&gt;During mount of mdt-2 the following error messages were printed:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Nov 13 16:27:52 lola-9 kernel: LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: 
Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(tgt_lastrcvd.c:1458:tgt_clients_data_init()) soaked-MDT0002: duplicate export for client generation 1
Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(obd_config.c:575:class_setup()) setup soaked-MDT0002 failed (-114)
Nov 13 16:27:53 lola-9 kernel: LustreError: 6485:0:(obd_config.c:1663:class_config_llog_handler()) MGC192.168.1.108@o2ib10: cfg command failed: rc = -114
Nov 13 16:27:53 lola-9 kernel: Lustre:    cmd=cf003 0:soaked-MDT0002  1:soaked-MDT0002_UUID  2:2  3:soaked-MDT0002-mdtlov  4:f  
Nov 13 16:27:53 lola-9 kernel: 
Nov 13 16:27:53 lola-9 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib10: The configuration from log &apos;soaked-MDT0002&apos; failed (-114). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_mount_server.c:1306:server_start_targets()) failed to start server soaked-MDT0002: -114
Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_mount_server.c:1794:server_fill_super()) Unable to start targets: -114
Nov 13 16:27:53 lola-9 kernel: LustreError: 6298:0:(obd_config.c:622:class_cleanup()) Device 4 not setup
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;before crashing with&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;lt;4&amp;gt;general protection fault: 0000 [#1] SMP
&amp;lt;4&amp;gt;last sysfs file: /sys/module/lfsck/initstate
&amp;lt;4&amp;gt;CPU 25
&amp;lt;4&amp;gt;Modules linked in: mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) ldiskfs(U) jbd2 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci isci libsas wmi mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
&amp;lt;4&amp;gt;
&amp;lt;4&amp;gt;Pid: 6329, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gb64632c.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
&amp;lt;4&amp;gt;RIP: 0010:[&amp;lt;ffffffffa0c4a6ed&amp;gt;]  [&amp;lt;ffffffffa0c4a6ed&amp;gt;] tgt_client_free+0x25d/0x610 [ptlrpc]
&amp;lt;4&amp;gt;RSP: 0018:ffff8808337fddd0  EFLAGS: 00010206
&amp;lt;4&amp;gt;RAX: 5a5a5a5a5a5a5a5a RBX: ffff8803b80c2400 RCX: ffff8803b80c6ec0
&amp;lt;4&amp;gt;RDX: 0000000000000007 RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000282
&amp;lt;4&amp;gt;RBP: ffff8808337fde00 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
&amp;lt;4&amp;gt;R10: 5a5a5a5a5a5a5a5a R11: 0000000000000000 R12: ffff8803b630d0b0
&amp;lt;4&amp;gt;R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a
&amp;lt;4&amp;gt;FS:  0000000000000000(0000) GS:ffff88044e520000(0000) knlGS:0000000000000000
&amp;lt;4&amp;gt;CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
&amp;lt;4&amp;gt;CR2: 0000003232070df0 CR3: 0000000001a85000 CR4: 00000000000407e0
&amp;lt;4&amp;gt;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
&amp;lt;4&amp;gt;DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
&amp;lt;4&amp;gt;Process obd_zombid (pid: 6329, threadinfo ffff8808337fc000, task ffff880834c75520)
&amp;lt;4&amp;gt;Stack:
&amp;lt;4&amp;gt; ffff8803b6308038 ffff8803b80c2400 0000370000000000 ffff8803b80c2400
&amp;lt;4&amp;gt;&amp;lt;d&amp;gt; ffff8803b6308038 ffff880834c75520 ffff8808337fde20 ffffffffa126ff81
&amp;lt;4&amp;gt;&amp;lt;d&amp;gt; ffff8803b6308078 0000000000000000 ffff8808337fde60 ffffffffa099a350
&amp;lt;4&amp;gt;Call Trace:
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa126ff81&amp;gt;] mdt_destroy_export+0x71/0x220 [mdt]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa099a350&amp;gt;] obd_zombie_impexp_cull+0x5e0/0xac0 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa099a895&amp;gt;] obd_zombie_impexp_thread+0x65/0x190 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff81064c00&amp;gt;] ? default_wake_function+0x0/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffffa099a830&amp;gt;] ? obd_zombie_impexp_thread+0x0/0x190 [obdclass]
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e78e&amp;gt;] kthread+0x9e/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8109e6f0&amp;gt;] ? kthread+0x0/0xc0
&amp;lt;4&amp;gt; [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&amp;lt;4&amp;gt;Code: 00 00 48 c7 83 c8 02 00 00 00 00 00 00 85 d2 78 4a 4d 85 e4 0f 84 4e 02 00 00 49 8b 84 24 18 03 00 00 48 85 c0 0f 84 3d 02 00 00 &amp;lt;f0&amp;gt; 0f b3 10 19 d2 85 d2 0f 84 23 03 00 00 f6 83 6f 01 00 00 02 
&amp;lt;1&amp;gt;RIP  [&amp;lt;ffffffffa0c4a6ed&amp;gt;] tgt_client_free+0x25d/0x610 [ptlrpc]
&amp;lt;4&amp;gt; RSP &amp;lt;ffff8808337fddd0&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Attached files: console, messages of node &lt;tt&gt;lola-9&lt;/tt&gt;&lt;/p&gt;</description>
                <environment>lola&lt;br/&gt;
build: tip of master(df6cf859bbb29392064e6ddb701f3357e01b3a13) +  patches</environment>
        <key id="33139">LU-7430</key>
            <summary>General protection fault: 0000 upon mounting MDT</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="pichong">Gregoire Pichon</assignee>
                                    <reporter username="heckes">Frank Heckes</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Mon, 16 Nov 2015 11:56:04 +0000</created>
                <updated>Wed, 17 Aug 2016 22:12:31 +0000</updated>
                            <resolved>Fri, 18 Dec 2015 14:06:09 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="133597" author="heckes" created="Mon, 16 Nov 2015 12:07:03 +0000"  >&lt;p&gt;Crash file has been stored at &lt;tt&gt;lhn:/scratch/crashdumps/lu-7430/127.0.0.1-2015-11-13-16:28:09&lt;/tt&gt; and can be uploaded on demand to desired storage location.&lt;/p&gt;</comment>
                            <comment id="134312" author="di.wang" created="Mon, 23 Nov 2015 22:35:13 +0000"  >&lt;p&gt;Just investigate this problem a bit. So there are two problems here.&lt;/p&gt;

&lt;p&gt;1. It seems MDT assign duplicate generations in last rcvd file.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000001:00080000:8.0:1448270282.553342:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0006-mdtlov_UUID idx: 0 lr: 661425870098 srv lr: 661424965096 lx: 1518615346990020 gen 0
00000001:00080000:8.0:1448270282.553360:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0004-mdtlov_UUID idx: 1 lr: 661425866544 srv lr: 661424965096 lx: 1518608412495276 gen 0
00000001:00080000:8.0:1448270282.553368:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0003-mdtlov_UUID idx: 2 lr: 661425864081 srv lr: 661424965096 lx: 1518609357754768 gen 0
00000001:00080000:8.0:1448270282.553383:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0007-mdtlov_UUID idx: 3 lr: 661425870368 srv lr: 661424965096 lx: 1518615346992984 gen 0
00000001:00080000:8.0:1448270282.553390:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0005-mdtlov_UUID idx: 4 lr: 661425870380 srv lr: 661424965096 lx: 1518608412535008 gen 0
00000001:00080000:8.0:1448270282.553398:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0002-mdtlov_UUID idx: 5 lr: 661425870358 srv lr: 661424965096 lx: 1518609357956232 gen 0
00000001:00080000:8.0:1448270282.553404:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: soaked-MDT0000-mdtlov_UUID idx: 6 lr: 661425870382 srv lr: 661424965096 lx: 1518609357957028 gen 0
00000001:00080000:8.0:1448270282.553413:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 7de62113-8c1f-79c4-ca4f-2848938e1c9e idx: 7 lr: 0 srv lr: 661424965096 lx: 0 gen 19
00000001:00080000:8.0:1448270282.553420:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 477a48c7-fa0c-b67e-c777-17f9099c0649 idx: 8 lr: 618478612079 srv lr: 661424965096 lx: 0 gen 36
00000001:00080000:8.0:1448270282.553427:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 2c75515f-eba9-e6e7-c604-c055fb7778c9 idx: 9 lr: 657130421084 srv lr: 661424965096 lx: 0 gen 37
00000001:00080000:8.0:1448270282.553434:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 03f180c9-ba1c-7578-bc08-987c14296d5a idx: 10 lr: 0 srv lr: 661424965096 lx: 0 gen 1
00000001:00080000:8.0:1448270282.553441:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: c5426649-6edd-3483-f076-e3c10338edce idx: 11 lr: 0 srv lr: 661424965096 lx: 0 gen 39
00000001:00080000:8.0:1448270282.553447:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: 937e0860-a0d8-59ba-f989-c403afaa31e2 idx: 12 lr: 635656643323 srv lr: 661424965096 lx: 0 gen 40
00000001:00080000:8.0:1448270282.553454:0:7875:0:(tgt_lastrcvd.c:1421:tgt_clients_data_init()) RCVRNG CLIENT uuid: b81b7277-aea2-e147-05e4-b16f1a92332a idx: 13 lr: 0 srv lr: 661424965096 lx: 0 gen 1
00000001:00020000:8.0:1448270282.553459:0:7875:0:(tgt_lastrcvd.c:1458:tgt_clients_data_init()) soaked-MDT0001: duplicate export for client generation 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This causes the mount failure. Hmm, this looks like multiple slot RPC patch problem. &lt;/p&gt;

&lt;p&gt;2. The error handler for this mount failure seems not be handled correctly, which caused the panic here.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;general protection fault: 0000 [#1] SMP 
last sysfs file: /sys/module/lfsck/initstate
CPU 25 
Modules linked in: mdd(U) lod(U) mdt(U) lfsck(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic crc32c_intel libcfs(U) 8021q garp stp llc nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm scsi_dh_rdac dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core lpc_ich mfd_core i2c_i801 ioatdma sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sd_mod crc_t10dif ahci wmi isci libsas mpt2sas scsi_transport_sas raid_class mlx4_ib ib_sa ib_mad ib_core ib_addr ipv6 mlx4_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 7713, comm: obd_zombid Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gf1f8275.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
RIP: 0010:[&amp;lt;ffffffffa0b9972d&amp;gt;]  [&amp;lt;ffffffffa0b9972d&amp;gt;] tgt_client_free+0x25d/0x610 [ptlrpc]
RSP: 0018:ffff88041b5f5dd0  EFLAGS: 00010206
RAX: 5a5a5a5a5a5a5a5a RBX: ffff880820d80400 RCX: 5a5a5a5a5a5a5a5a
RDX: 000000000000000d RSI: 5a5a5a5a5a5a5a5a RDI: 0000000000000286
RBP: ffff88041b5f5e00 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
R10: 5a5a5a5a5a5a5a5a R11: 0000000000000000 R12: ffff88083366f0b0
R13: 5a5a5a5a5a5a5a5a R14: 5a5a5a5a5a5a5a5a R15: 5a5a5a5a5a5a5a5a
FS:  0000000000000000(0000) GS:ffff88044e520000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f0d4f3bd000 CR3: 0000000833499000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process obd_zombid (pid: 7713, threadinfo ffff88041b5f4000, task ffff880428628040)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="134350" author="bzzz" created="Tue, 24 Nov 2015 06:32:01 +0000"  >&lt;p&gt;the console output contains a lot of I/O errors. can this is a root cause?&lt;/p&gt;</comment>
                            <comment id="134351" author="di.wang" created="Tue, 24 Nov 2015 06:58:32 +0000"  >&lt;p&gt;Yes, but this happened several times (&amp;gt; 5 times) specially at this place in soak-tests,  so I doubt this because of the I/O errors. &lt;/p&gt;</comment>
                            <comment id="134352" author="pichong" created="Tue, 24 Nov 2015 07:41:16 +0000"  >&lt;p&gt;How can I have a look at the lustre logs of the failing MDS ?&lt;/p&gt;</comment>
                            <comment id="134433" author="di.wang" created="Tue, 24 Nov 2015 18:42:29 +0000"  >&lt;p&gt;Here you are.&lt;/p&gt;</comment>
                            <comment id="134614" author="heckes" created="Thu, 26 Nov 2015 11:11:52 +0000"  >&lt;p&gt;Alex: No it isn&apos;t. The multipathd queries for some reason the device of the Emc^2 (or is it Dell now &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;) &lt;br/&gt;
controller devices although they&apos;re blacklisted in mulitpath configuration file. If you refer to this:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
end_request: critical target error, dev sdh, sector 0
sd 0:0:1:7: [sdh]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:1:7: [sdh]  Sense Key : Illegal Request [current] 
sd 0:0:1:7: [sdh]  Add. Sense: Logical block address out of range
sd 0:0:1:7: [sdh] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
end_request: critical target error, dev sdh, sector 0
__ratelimit: 310 callbacks suppressed
Buffer I/O error on device sdh, logical block 0
Buffer I/O error on device sdh, logical block 1
Buffer I/O error on device sdh, logical block 2
Buffer I/O error on device sdh, logical block 3
Buffer I/O error on device sdh, logical block 4
Buffer I/O error on device sdh, logical block 5
Buffer I/O error on device sdh, logical block 6
Buffer I/O error on device sdh, logical block 7
Buffer I/O error on device sdh, logical block 8
Buffer I/O error on device sdh, logical block 9
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;the messages can be ignored.&lt;/p&gt;</comment>
                            <comment id="134775" author="adilger" created="Mon, 30 Nov 2015 19:21:31 +0000"  >&lt;p&gt;Gr&#233;goire, Alex, is there a reason that the generation needs to be unique between different clients? If yes, then we need to examine the code that assigns the generation, as well as the recovery code to ensure that there aren&apos;t old slots on disk that are not cleared after recovery is completed. &lt;/p&gt;</comment>
                            <comment id="134830" author="bzzz" created="Tue, 1 Dec 2015 08:23:31 +0000"  >&lt;p&gt;well, it should be unique as it&apos;s used to bind the replies to the clients. essentially it&apos;s unique client id growing monotonically. it&apos;s calculated from last_rcvd on every boot. given 1 was duplicated, it looks like the whole process of last_rcvd scanning was skipped for a reason.&lt;/p&gt;</comment>
                            <comment id="134837" author="pichong" created="Tue, 1 Dec 2015 10:23:09 +0000"  >&lt;p&gt;I have reproduced the error handler problem in MDT recovery that leads to the GPF.&lt;/p&gt;

&lt;p&gt;In tgt_clients_data_init(), a new export is created for each valid client area detected in the last_rcvd file. Valid client area is based on a non-zero lcd_uuid field. Creation operations include calls to class_new_export() that initializes the obd_export structure and add it to the hash tables, and tgt_client_add() routines that update the lu_target related fields (lut_client_bitmap).&lt;/p&gt;

&lt;p&gt;When an error is encountered, the exports deletions are postponed to be handled later by obd_zombid thread. Export deletion operation includes call to tgt_client_free() routine that updates the lu_target related fields (especially the lut_client_bitmap).&lt;/p&gt;

&lt;p&gt;But, the error in tgt_clients_data_init() is reported back up to tgt_server_data_init() and tgt_init() routines, which frees the lu_target data including the lut_client_bitmap.&lt;/p&gt;

&lt;p&gt;When obd_zombid thread calls tgt_client_free() the lut_client_bitmap has been poison&apos;ed which make the MDS crash.&lt;/p&gt;

&lt;p&gt;I am going to see how it is possible to manage correctly that error path, and push a patch.&lt;br/&gt;
Note that the issue was already present before multi slot RPC feature, but the error path was probably never called.&lt;/p&gt;


&lt;p&gt;About the duplicate generation in the last_rcvd file, I have no idea of what could lead to that situation for now.&lt;/p&gt;</comment>
                            <comment id="134975" author="gerrit" created="Wed, 2 Dec 2015 13:55:03 +0000"  >&lt;p&gt;Gr&#233;goire Pichon (gregoire.pichon@bull.net) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/17424&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17424&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7430&quot; title=&quot;General protection fault: 0000 upon mounting MDT&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7430&quot;&gt;&lt;del&gt;LU-7430&lt;/del&gt;&lt;/a&gt; mdt: better handle MDT recovery error path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b3341ed8f0b2568eebdec1677bcb074e3da9d410&lt;/p&gt;</comment>
                            <comment id="136818" author="gerrit" created="Fri, 18 Dec 2015 05:27:34 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/17424/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17424/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7430&quot; title=&quot;General protection fault: 0000 upon mounting MDT&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7430&quot;&gt;&lt;del&gt;LU-7430&lt;/del&gt;&lt;/a&gt; mdt: better handle MDT recovery error path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 0d3a07a8aa46bd190813b6e6e3da0e12c61a9d09&lt;/p&gt;</comment>
                            <comment id="136844" author="jgmitter" created="Fri, 18 Dec 2015 14:06:09 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="33993">LU-7638</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="34782">LU-7794</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="33262">LU-7455</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="19620" name="console-lola-9.log.gz" size="901480" author="heckes" created="Mon, 16 Nov 2015 12:19:28 +0000"/>
                            <attachment id="19702" name="dump_today.out" size="49157" author="di.wang" created="Tue, 24 Nov 2015 18:42:29 +0000"/>
                            <attachment id="19621" name="messages-lola-9.log.bz2" size="674621" author="heckes" created="Mon, 16 Nov 2015 12:19:28 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxt33:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>