<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:17:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8462] OSS keeps dropping into KDB</title>
                <link>https://jira.whamcloud.com/browse/LU-8462</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;On Tuesday, July 26th, the RAID controller firmware was updated, causing two OSSes to drop into KDB due to missing paths.  Since then, we have had a spate of crashes on the file system:&lt;/p&gt;

&lt;p&gt;Tue Jul 26 - 394 - 2028        KDB (I/O)&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;395 - 2000        KDB (I/O)&lt;br/&gt;
Fri jUl 29 -  s393 - 0910        KDB (aborting journal)&lt;br/&gt;
Fri Jul 29 - s394 - 1922        KDB (disk)&lt;br/&gt;
Fri Jul 29 - s393 - 2123        KDB (disk)&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;s394 Tue Jul 26 - 2028 :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Buffer I/O error on device dm-21, logical block 137126919
lost page write due to I/O error on dm-21
end_request: I/O error, dev dm-21, sector 1073743416
LDISKFS-fs error (device dm-21): ldiskfs_read_block_bitmap: Cannot read block bitmap - block_group = 4295, block_bitmap = 134217927
Entering kdb (current=0xffff880360d92ab0, pid 63652) on processor 0 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[0]kdb&amp;gt; [-- MARK -- Tue Jul 26 18:00:00 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s395 Tue Jul 26 - 2000 :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;end_request: I/O error, dev dm-18, sector 759488504
end_request: I/O error, dev dm-18, sector 759488504
end_request: I/O error, dev dm-18, sector 295072
LustreError: 63719:0:(osd_io.c:1179:osd_ldiskfs_write_record()) dm-18: error reading offset 1342464 (block 327): rc = -5
LustreError: 63719:0:(osd_handler.c:1051:osd_trans_stop()) Failure in transaction hook: -5
------------[ cut here ]------------
WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x82/0xa0() (Tainted: G        W  ---------------   )
Hardware name: C1104G-RP5
Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) jbd2 lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) sunrpc dm_round_robin bonding ib_ucm(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) configfs ib_srp(U) scsi_transport_srp(U) ib_ipoib(U) ib_cm(U) ib_uverbs(U) ib_umad(U) dm_multipath microcode acpi_pad iTCO_wdt iTCO_vendor_support igb hwmon dca i2c_algo_bit ptp pps_core i2c_i801 i2c_core sg lpc_ich mfd_core tcp_bic ext3 jbd sd_mod crc_t10dif ahci isci libsas scsi_transport_sas wmi mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ipv6 mlx4_core(U) mlx_compat(U) memtrack(U) dm_mirror dm_region_hash dm_log dm_mod gru [last unloaded: scsi_wait_scan]
Pid: 36266, comm: jbd2/dm-6 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
Call Trace:
 [&amp;lt;ffffffff81074127&amp;gt;] ? warn_slowpath_common+0x87/0xc0
 [&amp;lt;ffffffff8107417a&amp;gt;] ? warn_slowpath_null+0x1a/0x20
 [&amp;lt;ffffffff811c1ca2&amp;gt;] ? mark_buffer_dirty+0x82/0xa0
 [&amp;lt;ffffffffa0b6b385&amp;gt;] ? __jbd2_journal_temp_unlink_buffer+0xa5/0x140 [jbd2]
 [&amp;lt;ffffffffa0b6b436&amp;gt;] ? __jbd2_journal_unfile_buffer+0x16/0x30 [jbd2]
 [&amp;lt;ffffffffa0b6b798&amp;gt;] ? __jbd2_journal_refile_buffer+0xc8/0xf0 [jbd2]
 [&amp;lt;ffffffffa0b6e548&amp;gt;] ? jbd2_journal_commit_transaction+0xdc8/0x15a0 [jbd2]
 [&amp;lt;ffffffff810873eb&amp;gt;] ? try_to_del_timer_sync+0x7b/0xe0
 [&amp;lt;ffffffffa0b73c48&amp;gt;] ? kjournald2+0xb8/0x220 [jbd2]
 [&amp;lt;ffffffff8109e120&amp;gt;] ? autoremove_wake_function+0x0/0x40
 [&amp;lt;ffffffffa0b73b90&amp;gt;] ? kjournald2+0x0/0x220 [jbd2]
 [&amp;lt;ffffffff8109dc8e&amp;gt;] ? kthread+0x9e/0xc0
 [&amp;lt;ffffffff8100c28a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffff8109dbf0&amp;gt;] ? kthread+0x0/0xc0
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
---[ end trace 8870bed99a68f953 ]---
srp_daemon[3088]: No response to inform info registration
srp_daemon[3088]: Fail to register to traps, maybe there is no opensm running on fabric or IB port is down
srp_daemon[3088]: bad MAD status (110) from lid 1
scsi host7: ib_srp: Got failed path rec status -110
scsi host7: ib_srp: Path record query failed
scsi host7: reconnect attempt 4 failed (-110)
------------[ cut here ]------------
kernel BUG at /usr/src/redhat/BUILD/lustre-2.5.3/ldiskfs/mballoc.c:3189!

Entering kdb (current=0xffff881ae3470040, pid 63719) on processor 9 due to KDB_ENTER()
[9]kdb&amp;gt; [-- MARK -- Tue Jul 26 18:00:00 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s393 Fri Jul 29 -  0910:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LDISKFS-fs error (device dm-18): ldiskfs_mb_release_inode_pa: pa free mismatch: [pa ffff8801ce41bf28] [phy 1437833216] [logic 334848] [len 1024] [free 542] [error 0] [inode 207111] [freed 1024]
Aborting journal on device dm-4.
Kernel panic - not syncing: LDISKFS-fs (device dm-18): panic forced after error

Pid: 165, comm: kswapd1 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
Call Trace:
 [&amp;lt;ffffffff81564fb9&amp;gt;] ? panic+0xa7/0x190
 [&amp;lt;ffffffffa0bc5118&amp;gt;] ? ldiskfs_commit_super+0x188/0x210 [ldiskfs]
 [&amp;lt;ffffffffa0bc5724&amp;gt;] ? ldiskfs_handle_error+0xc4/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa0bc5ac2&amp;gt;] ? __ldiskfs_error+0x82/0x90 [ldiskfs]
 [&amp;lt;ffffffffa0ba920f&amp;gt;] ? ldiskfs_mb_release_inode_pa+0x26f/0x360 [ldiskfs]

Entering kdb (current=0xffff881065789520, pid 165) on processor 8 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[8]kdb&amp;gt; [-- Fri Jul 29 08:55:12 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s394 Fri Jul 29 - 1922:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Buffer I/O error on device dm-21, logical block 137126919
lost page write due to I/O error on dm-21
end_request: I/O error, dev dm-21, sector 1073743416
LDISKFS-fs error (device dm-21): ldiskfs_read_block_bitmap: Cannot read block bitmap - block_group = 4295, block_bitmap = 134217927
Entering kdb (current=0xffff880360d92ab0, pid 63652) on processor 0 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[0]kdb&amp;gt; [-- MARK -- Tue Jul 26 18:00:00 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s393 Fri Jul 29 -2123:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[-- MARK -- Fri Jul 29 18:00:00 2016]
LDISKFS-fs error (device dm-27): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 4891corrupted: 21438 blocks free in bitmap, 25535 - in gd

Aborting journal on device dm-2.
LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb:
Kernel panic - not syncing: LDISKFS-fs (device dm-27): panic forced after error

Pid: 95204, comm: ll_ost_io01_021 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
Call Trace:
 [&amp;lt;ffffffff81564fb9&amp;gt;] ? panic+0xa7/0x190
 [&amp;lt;ffffffffa0eb0118&amp;gt;] ? ldiskfs_commit_super+0x188/0x210 [ldiskf s ]14
 out of 16 cpus in kdb, waiting for the rest, timeout in 10 second(s)
 [&amp;lt;ffffffffa0eb0724&amp;gt;] ? ldiskfs_handle_error+0xc4/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa0eb0ac2&amp;gt;] ? __ldiskfs_error+0x82/0x90 [ldiskfs]
 [&amp;lt;ffffffff810f2059&amp;gt;] ? delayacct_end+0x89/0xa0
 [&amp;lt;ffffffffa0e92a29&amp;gt;] ? ldiskfs_mb_check_ondisk_bitmap+0x149/0x150 [ldiskfs]
 [&amp;lt;ffffffffa0e92aad&amp;gt;] ? ldiskfs_mb_generate_from_pa+0x7d/0x180 [ldiskfs]
 [&amp;lt;ffffffff8109e1a0&amp;gt;] ? wake_bit_function+0x0/0x50
 [&amp;lt;ffffffffa0e9528b&amp;gt;] ? ldiskfs_mb_init_cache+0x55b/0xa30 [ldiskfs]
 [&amp;lt;ffffffffa0e9587e&amp;gt;] ? ldiskfs_mb_init_group+0x11e/0x210 [ldiskfs]
 [&amp;lt;ffffffffa0e95dd5&amp;gt;] ? ldiskfs_mb_load_buddy+0x355/0x390 [ldiskfs]
 [&amp;lt;ffffffffa0e7c00b&amp;gt;] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
 [&amp;lt;ffffffffa0e96b7d&amp;gt;] ? ldiskfs_mb_find_by_goal+0x6d/0x2e0 [ldiskfs]
 [&amp;lt;ffffffffa0e97019&amp;gt;] ? ldiskfs_mb_regular_allocator+0x59/0x410 [ldiskfs]
 [&amp;lt;ffffffffa0eb12e8&amp;gt;] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
 [&amp;lt;ffffffffa0e91872&amp;gt;] ? ldiskfs_mb_normalize_request+0x2c2/0x3d0 [ldiskfs]
 [&amp;lt;ffffffffa0e9923d&amp;gt;] ? ldiskfs_mb_new_blocks+0x47d/0x630 [ldiskfs]
 [&amp;lt;ffffffffa0eb1390&amp;gt;] ? ldiskfs_journal_start_sb+0x70/0x170 [ldiskfs]
 [&amp;lt;ffffffffa02369ff&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x57f/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0e7ecd2&amp;gt;] ? ldiskfs_ext_walk_space+0x142/0x310 [ldiskfs]
 [&amp;lt;ffffffffa0236480&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x0/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa02361cc&amp;gt;] ? fsfilt_map_nblocks+0xcc/0xf0 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa02362f0&amp;gt;] ? fsfilt_ldiskfs_map_ext_inode_pages+0x100/0x200 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0236475&amp;gt;] ? fsfilt_ldiskfs_map_inode_pages+0x85/0x90 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0eb1884&amp;gt;] ? ldiskfs_dquot_initialize+0x94/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa0f89342&amp;gt;] ? osd_write_commit+0x302/0x620 [osd_ldiskfs]
 [&amp;lt;ffffffffa137c094&amp;gt;] ? ofd_commitrw_write+0x684/0x11b0 [ofd]
 [&amp;lt;ffffffffa137edfd&amp;gt;] ? ofd_commitrw+0x5cd/0xbb0 [ofd]
 [&amp;lt;ffffffffa0372861&amp;gt;] ? lprocfs_counter_add+0x151/0x1d6 [lvfs]
 [&amp;lt;ffffffffa131319d&amp;gt;] ? obd_commitrw+0x11d/0x390 [ost]
 [&amp;lt;ffffffffa131cf71&amp;gt;] ? ost_brw_write+0xea1/0x15d0 [ost]
 [&amp;lt;ffffffff812b9076&amp;gt;] ? vsnprintf+0x336/0x5e0
 [&amp;lt;ffffffffa0686500&amp;gt;] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
 [&amp;lt;ffffffffa13235cc&amp;gt;] ? ost_handle+0x439c/0x44d0 [ost]
 [&amp;lt;ffffffffa03f8a44&amp;gt;] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [&amp;lt;ffffffffa06d60c5&amp;gt;] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [&amp;lt;ffffffffa03fe8d5&amp;gt;] ? lc_watchdog_touch+0x65/0x170 [libcfs]
 [&amp;lt;ffffffffa06cea69&amp;gt;] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [&amp;lt;ffffffffa06d889d&amp;gt;] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
 [&amp;lt;ffffffff8100c28a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffffa06d7da0&amp;gt;] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb: Detected aborted journal [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20

.  15 out of 16 cpus in kdb, waiting for the rest, timeout in 9 second(s)
..1 cpu is not in kdb, its state is unknown

Entering kdb (current=0xffff880e989cc040, pid 95204) on processor 7 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[7]kdb&amp;gt; [-- MARK -- Fri Jul 29 19:00:00 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="38557">LU-8462</key>
            <summary>OSS keeps dropping into KDB</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="hyeung">Herbert Yeung</reporter>
                        <labels>
                    </labels>
                <created>Tue, 2 Aug 2016 01:13:50 +0000</created>
                <updated>Sat, 26 Nov 2016 09:09:48 +0000</updated>
                            <resolved>Sat, 26 Nov 2016 09:09:48 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="160564" author="green" created="Tue, 2 Aug 2016 17:27:56 +0000"  >&lt;p&gt;It looks like when you upgraded your RAID controller firmware, it makes your RAID controller to return IO errors when Lustre tries to read data from the array.&lt;/p&gt;

&lt;p&gt;I imagine this needs to be resolved first to have any sort of stable operations. Then after that is resolved you&apos;ll probably need to run e2fsck to fix whatever disk problems happened (this is only after IO errors are totally gone! or you are risking making things even worse).&lt;/p&gt;

&lt;p&gt;As for the crash - yes, it should not really be happening during IO errors, but it&apos;s like a third order after effect. These failures are fixed in newer Lustre releases.&lt;/p&gt;</comment>
                            <comment id="160610" author="jaylan" created="Tue, 2 Aug 2016 21:36:21 +0000"  >&lt;p&gt;I saw two crash dumps on service395 (nbp6-oss16.) The first crash happened after the system had been up running for 40 days before crash. The second crash happened during recovery after 1st crash.&lt;/p&gt;

&lt;p&gt;Here is the stack trace of the first crash:&lt;/p&gt;

&lt;p&gt;      KERNEL: ../vmlinux                        &lt;br/&gt;
    DUMPFILE: vmcore  &lt;span class=&quot;error&quot;&gt;&amp;#91;PARTIAL DUMP&amp;#93;&lt;/span&gt;&lt;br/&gt;
        CPUS: 16&lt;br/&gt;
        DATE: Tue Jul 26 17:49:06 2016&lt;br/&gt;
      UPTIME: 40 days, 05:53:18&lt;br/&gt;
LOAD AVERAGE: 1.70, 1.15, 0.58&lt;br/&gt;
       TASKS: 1526&lt;br/&gt;
    NODENAME: nbp6-oss16&lt;br/&gt;
     RELEASE: 2.6.32-504.30.3.el6.20151008.x86_64.lustre253&lt;br/&gt;
     VERSION: #1 SMP Tue Oct 27 21:45:38 EDT 2015&lt;br/&gt;
     MACHINE: x86_64  (2599 Mhz)&lt;br/&gt;
      MEMORY: 128 GB&lt;br/&gt;
       PANIC: &quot;kernel BUG at /usr/src/redhat/BUILD/lustre-2.5.3/ldiskfs/mballoc.c:3189!&quot;&lt;br/&gt;
         PID: 63719&lt;br/&gt;
     COMMAND: &quot;ll_ost_io02_020&quot;&lt;br/&gt;
        TASK: ffff881ae3470040  &lt;span class=&quot;error&quot;&gt;&amp;#91;THREAD_INFO: ffff881233f5a000&amp;#93;&lt;/span&gt;&lt;br/&gt;
         CPU: 9&lt;br/&gt;
       STATE: TASK_RUNNING (PANIC)&lt;/p&gt;

&lt;p&gt;crash&amp;gt; bt&lt;br/&gt;
PID: 63719  TASK: ffff881ae3470040  CPU: 9   COMMAND: &quot;ll_ost_io02_020&quot;&lt;br/&gt;
 #0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b030&amp;#93;&lt;/span&gt; machine_kexec at ffffffff8103b5db&lt;br/&gt;
 #1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b090&amp;#93;&lt;/span&gt; crash_kexec at ffffffff810c9432&lt;br/&gt;
 #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b160&amp;#93;&lt;/span&gt; kdb_kdump_check at ffffffff8129ad17&lt;br/&gt;
 #3 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b170&amp;#93;&lt;/span&gt; kdb_main_loop at ffffffff8129df07&lt;br/&gt;
 #4 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b280&amp;#93;&lt;/span&gt; kdb_save_running at ffffffff8129806e&lt;br/&gt;
 #5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b290&amp;#93;&lt;/span&gt; kdba_main_loop at ffffffff814806a8&lt;br/&gt;
 #6 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b2d0&amp;#93;&lt;/span&gt; kdb at ffffffff8129b206&lt;br/&gt;
 #7 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b340&amp;#93;&lt;/span&gt; report_bug at ffffffff812ae6b3&lt;br/&gt;
 #8 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b370&amp;#93;&lt;/span&gt; die at ffffffff8101106f&lt;br/&gt;
 #9 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b3a0&amp;#93;&lt;/span&gt; do_trap at ffffffff81569614&lt;br/&gt;
#10 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b400&amp;#93;&lt;/span&gt; do_invalid_op at ffffffff8100d025&lt;br/&gt;
#11 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b4a0&amp;#93;&lt;/span&gt; invalid_op at ffffffff8100c01b&lt;br/&gt;
    &lt;span class=&quot;error&quot;&gt;&amp;#91;exception RIP: ldiskfs_mb_use_inode_pa+186&amp;#93;&lt;/span&gt;&lt;br/&gt;
    RIP: ffffffffa0ba5d2a  RSP: ffff881233f5b550  RFLAGS: 00010202&lt;br/&gt;
    RAX: 000000000000000b  RBX: ffff881e610fd898  RCX: ffff881e610fd8cc&lt;br/&gt;
    RDX: 0000000000000400  RSI: 0000000005eec7f4  RDI: 0000000000008000&lt;br/&gt;
    RBP: ffff881233f5b570   R8: ffff881e610fd8d0   R9: 0000000000000000&lt;br/&gt;
    R10: 0000000000000010  R11: 0000000000000000  R12: ffff882069a3dd68&lt;br/&gt;
    R13: 0000000005eec800  R14: 000000000000000c  R15: 00000000ffffffff&lt;br/&gt;
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018&lt;br/&gt;
#12 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b578&amp;#93;&lt;/span&gt; ldiskfs_mb_use_preallocated at ffffffffa0ba5f59 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#13 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b5d8&amp;#93;&lt;/span&gt; ldiskfs_mb_new_blocks at ffffffffa0bae0bd &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#14 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b678&amp;#93;&lt;/span&gt; ldiskfs_ext_new_extent_cb at ffffffffa01be9ff &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#15 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b728&amp;#93;&lt;/span&gt; ldiskfs_ext_walk_space at ffffffffa0b93cd2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#16 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b7b8&amp;#93;&lt;/span&gt; fsfilt_map_nblocks at ffffffffa01be1cc &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#17 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b818&amp;#93;&lt;/span&gt; fsfilt_ldiskfs_map_ext_inode_pages at ffffffffa01be2f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#18 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b898&amp;#93;&lt;/span&gt; fsfilt_ldiskfs_map_inode_pages at ffffffffa01be475 &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#19 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b8d8&amp;#93;&lt;/span&gt; osd_write_commit at ffffffffa1488342 &lt;span class=&quot;error&quot;&gt;&amp;#91;osd_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
#20 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b948&amp;#93;&lt;/span&gt; ofd_commitrw_write at ffffffffa1ae5094 &lt;span class=&quot;error&quot;&gt;&amp;#91;ofd&amp;#93;&lt;/span&gt;&lt;br/&gt;
#21 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5b9d8&amp;#93;&lt;/span&gt; ofd_commitrw at ffffffffa1ae7dfd &lt;span class=&quot;error&quot;&gt;&amp;#91;ofd&amp;#93;&lt;/span&gt;&lt;br/&gt;
#22 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5ba68&amp;#93;&lt;/span&gt; obd_commitrw at ffffffffa1a7c19d &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
#23 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5bae8&amp;#93;&lt;/span&gt; ost_brw_write at ffffffffa1a85f71 &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
#24 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5bc78&amp;#93;&lt;/span&gt; ost_handle at ffffffffa1a8c5cc &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
#25 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5bdc8&amp;#93;&lt;/span&gt; ptlrpc_server_handle_request at ffffffffa06dd0c5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
#26 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5bea8&amp;#93;&lt;/span&gt; ptlrpc_main at ffffffffa06df89d &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
#27 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff881233f5bf48&amp;#93;&lt;/span&gt; kernel_thread at ffffffff8100c28a&lt;br/&gt;
crash&amp;gt; &lt;/p&gt;

&lt;p&gt;Line 3189 of ldiskfs/mballoc.c is in ldksifs_mb_use_inode_pa()&lt;br/&gt;
...&lt;br/&gt;
        BUG_ON(pa-&amp;gt;pa_free &amp;lt; len);&lt;br/&gt;
&amp;#8212;&lt;/p&gt;</comment>
                            <comment id="160611" author="hyeung" created="Tue, 2 Aug 2016 21:38:24 +0000"  >&lt;p&gt;Copied the wrong kdb output for s394 Fri Jul 29 - 1922&lt;/p&gt;

&lt;p&gt;LDISKFS-fs error (device dm-27): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 4891corrupted: 21438 blocks free in bitmap, 25535 - in gd&lt;/p&gt;

&lt;p&gt;Aborting journal on device dm-2.&lt;br/&gt;
LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb:&lt;br/&gt;
Kernel panic - not syncing: LDISKFS-fs (device dm-27): panic forced after error&lt;/p&gt;

&lt;p&gt;Pid: 95204, comm: ll_ost_io01_021 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1&lt;br/&gt;
Call Trace:&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81564fb9&amp;gt;&amp;#93;&lt;/span&gt; ? panic+0xa7/0x190&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eb0118&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_commit_super+0x188/0x210 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskf s &amp;#93;&lt;/span&gt;14&lt;br/&gt;
 out of 16 cpus in kdb, waiting for the rest, timeout in 10 second(s)&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eb0724&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_handle_error+0xc4/0xd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eb0ac2&amp;gt;&amp;#93;&lt;/span&gt; ? __ldiskfs_error+0x82/0x90 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810f2059&amp;gt;&amp;#93;&lt;/span&gt; ? delayacct_end+0x89/0xa0&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e92a29&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_check_ondisk_bitmap+0x149/0x150 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e92aad&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_generate_from_pa+0x7d/0x180 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8109e1a0&amp;gt;&amp;#93;&lt;/span&gt; ? wake_bit_function+0x0/0x50&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e9528b&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_init_cache+0x55b/0xa30 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e9587e&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_init_group+0x11e/0x210 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e95dd5&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_load_buddy+0x355/0x390 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e7c00b&amp;gt;&amp;#93;&lt;/span&gt; ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e96b7d&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_find_by_goal+0x6d/0x2e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e97019&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_regular_allocator+0x59/0x410 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eb12e8&amp;gt;&amp;#93;&lt;/span&gt; ? __ldiskfs_journal_stop+0x68/0xa0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e91872&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_normalize_request+0x2c2/0x3d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e9923d&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_mb_new_blocks+0x47d/0x630 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eb1390&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_journal_start_sb+0x70/0x170 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa02369ff&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_ext_new_extent_cb+0x57f/0x6cc &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e7ecd2&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_ext_walk_space+0x142/0x310 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0236480&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_ext_new_extent_cb+0x0/0x6cc &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa02361cc&amp;gt;&amp;#93;&lt;/span&gt; ? fsfilt_map_nblocks+0xcc/0xf0 &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa02362f0&amp;gt;&amp;#93;&lt;/span&gt; ? fsfilt_ldiskfs_map_ext_inode_pages+0x100/0x200 &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0236475&amp;gt;&amp;#93;&lt;/span&gt; ? fsfilt_ldiskfs_map_inode_pages+0x85/0x90 &lt;span class=&quot;error&quot;&gt;&amp;#91;fsfilt_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0eb1884&amp;gt;&amp;#93;&lt;/span&gt; ? ldiskfs_dquot_initialize+0x94/0xd0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0f89342&amp;gt;&amp;#93;&lt;/span&gt; ? osd_write_commit+0x302/0x620 &lt;span class=&quot;error&quot;&gt;&amp;#91;osd_ldiskfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa137c094&amp;gt;&amp;#93;&lt;/span&gt; ? ofd_commitrw_write+0x684/0x11b0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ofd&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa137edfd&amp;gt;&amp;#93;&lt;/span&gt; ? ofd_commitrw+0x5cd/0xbb0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ofd&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0372861&amp;gt;&amp;#93;&lt;/span&gt; ? lprocfs_counter_add+0x151/0x1d6 &lt;span class=&quot;error&quot;&gt;&amp;#91;lvfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa131319d&amp;gt;&amp;#93;&lt;/span&gt; ? obd_commitrw+0x11d/0x390 &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa131cf71&amp;gt;&amp;#93;&lt;/span&gt; ? ost_brw_write+0xea1/0x15d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff812b9076&amp;gt;&amp;#93;&lt;/span&gt; ? vsnprintf+0x336/0x5e0&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0686500&amp;gt;&amp;#93;&lt;/span&gt; ? target_bulk_timeout+0x0/0xc0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa13235cc&amp;gt;&amp;#93;&lt;/span&gt; ? ost_handle+0x439c/0x44d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03f8a44&amp;gt;&amp;#93;&lt;/span&gt; ? libcfs_id2str+0x74/0xb0 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06d60c5&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_server_handle_request+0x385/0xc00 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03fe8d5&amp;gt;&amp;#93;&lt;/span&gt; ? lc_watchdog_touch+0x65/0x170 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06cea69&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_wait_event+0xa9/0x2d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06d889d&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0xafd/0x1780 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt; &lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c28a&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0xa/0x20&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06d7da0&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1780 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb: Detected aborted journal &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c280&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0x0/0x20&lt;/p&gt;

&lt;p&gt;.  15 out of 16 cpus in kdb, waiting for the rest, timeout in 9 second(s)&lt;br/&gt;
..1 cpu is not in kdb, its state is unknown&lt;/p&gt;

&lt;p&gt;Entering kdb (current=0xffff880e989cc040, pid 95204) on processor 7 Oops: (null)&lt;br/&gt;
due to oops @ 0x0&lt;br/&gt;
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;7&amp;#93;&lt;/span&gt;kdb&amp;gt; &lt;span class=&quot;error&quot;&gt;&amp;#91;-- MARK -- Fri Jul 29 19:00:00 2016&amp;#93;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="160613" author="hyeung" created="Tue, 2 Aug 2016 21:46:08 +0000"  >&lt;p&gt;To clarify, the controllers were being downgraded as the latest version had a firmware bug.  They were downgraded to the version that all of our other RAIDs are running at (which do not experience this problem).  During the upgrade process, all access to the disks were lost, resulting in (expected) I/O errors.  So the crashes on Tuesday are understandable (though I am glad we won&apos;t be running into them in the future once we upgrade).&lt;/p&gt;

&lt;p&gt;I have scanned through the controller logs and do not see any signs that there is some ongoing disk corruption problem (though it could be invisible).  When the disks were yanked from lustre, could this have corrupted the file system to cause the following crashes?  If so, would a lfsck clear up the problems?&lt;/p&gt;</comment>
                            <comment id="160687" author="green" created="Wed, 3 Aug 2016 17:00:07 +0000"  >&lt;p&gt;From the messages it looks like the bitmaps got corrupted in your case.&lt;br/&gt;
Patches in bug &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7114&quot; title=&quot;ldiskfs: corrupted bitmaps handling patches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7114&quot;&gt;&lt;del&gt;LU-7114&lt;/del&gt;&lt;/a&gt; should convert that from a fatal even into a non-fatal. Recent e2fsck should also fix that I believe.&lt;/p&gt;</comment>
                            <comment id="160705" author="pjones" created="Wed, 3 Aug 2016 18:54:13 +0000"  >&lt;p&gt;Fan Yong&lt;/p&gt;

&lt;p&gt;Is the fix from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7114&quot; title=&quot;ldiskfs: corrupted bitmaps handling patches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7114&quot;&gt;&lt;del&gt;LU-7114&lt;/del&gt;&lt;/a&gt; sufficient to address this issue? If so could you please port it to the 2.7 FE branch?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="160741" author="yong.fan" created="Thu, 4 Aug 2016 00:05:09 +0000"  >&lt;p&gt;Originally, the case of the bitmap corruption is fatal, just like this ticket described, the system becomes unavailable. The patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7114&quot; title=&quot;ldiskfs: corrupted bitmaps handling patches&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7114&quot;&gt;&lt;del&gt;LU-7114&lt;/del&gt;&lt;/a&gt; is NOT to fix bitmap corruption, instead, it marks the ground(s) with bad bitmap(s) and skip errors, then the system is still usable under such trouble case. I have back ported the patch to b2_7_fe (&lt;a href=&quot;http://review.whamcloud.com/21705&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/21705&lt;/a&gt;), but it just make the system to be available, the customer still needs to run e2fsck to fix the corrupted bitmap(s).&lt;/p&gt;</comment>
                            <comment id="161323" author="hyeung" created="Tue, 9 Aug 2016 19:47:46 +0000"  >&lt;p&gt;With the patch, will we still get error messages/warnings like below?  Want to create a SEC rule to notify us if it happens again.&lt;/p&gt;

&lt;p&gt;LDISKFS-fs error (device dm-27): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 4891corrupted: 21438 blocks free in bitmap, 25535 - in gd&lt;/p&gt;</comment>
                            <comment id="161718" author="yong.fan" created="Fri, 12 Aug 2016 06:46:03 +0000"  >&lt;p&gt;With the patch applied, there may be still some message. But the message will be warning (#define KERN_WARNING &quot;&amp;lt;4&amp;gt;&quot;), not error (#define KERN_CRIT &quot;&amp;lt;2&amp;gt;&quot;).&lt;/p&gt;</comment>
                            <comment id="163053" author="yong.fan" created="Wed, 24 Aug 2016 17:50:21 +0000"  >&lt;p&gt;Any feedback? Thanks!&lt;/p&gt;</comment>
                            <comment id="163453" author="ndauchy" created="Mon, 29 Aug 2016 18:21:27 +0000"  >&lt;p&gt;What kind of feedback were you looking for?&lt;/p&gt;

&lt;p&gt;We did run e2fsck on the related servers and it corrected some errors.&lt;/p&gt;

&lt;p&gt;It looks like the backport patch requires review and landing to the 2.7 FE branch.&lt;/p&gt;</comment>
                            <comment id="163499" author="yong.fan" created="Mon, 29 Aug 2016 23:53:44 +0000"  >&lt;p&gt;I mean that have you tried the patch? If yes, any results? Besides, any others you want us to do?&lt;/p&gt;</comment>
                            <comment id="163500" author="ndauchy" created="Tue, 30 Aug 2016 00:06:22 +0000"  >&lt;p&gt;We have not tried the patch at NASA yet, were waiting on completed review and testing and landing. Since we fixed the on-disk corruption with e2fsck, I don&apos;t believe we have a test case for it anymore anyway.  As you described it seems like it will be sufficient though, so please proceed with getting it landed in the 2.7 FE branch and we will pull it into a future build.  Thanks!&lt;/p&gt;</comment>
                            <comment id="175083" author="yong.fan" created="Sat, 26 Nov 2016 09:09:48 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/21705/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/21705/&lt;/a&gt; has been landed to b2_7_fe.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="31980">LU-7114</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="38558">LU-8463</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyj8f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>