<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:17:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8463] OSSes drop into KDB during recovery</title>
                <link>https://jira.whamcloud.com/browse/LU-8463</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;During the crashes reported in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8462&quot; title=&quot;OSS keeps dropping into KDB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8462&quot;&gt;&lt;del&gt;LU-8462&lt;/del&gt;&lt;/a&gt;, several times upon recovery, the OSS will drop into KDB. Here&apos;s a expanded timeline:&lt;/p&gt;

&lt;p&gt;Tues Jul 26 - 394 - 2028 KDB (I/O)&lt;/p&gt;

&lt;p&gt;    395 - 2000 KDB (I/O), (disk) crash during recovery, until e2fsck&lt;br/&gt;
    Fri jUl 29 - s393 - 0910 KDB (aborting journal)&lt;br/&gt;
    Fri Jul 29 - s394 - 1922 KDB (disk), (disk) crash druing recovery, e2fcsk&lt;br/&gt;
    Fri Jul 29 - s393 - 2123 KDB (disk), (disk) crash during recovery, no e2fsck&lt;br/&gt;
    Sun Jul 31 - s393 - 1519 KDB (disk)&lt;/p&gt;

&lt;p&gt;After the initial crash during recovery on Tuesday, I ran e2fsck on the node and was able to successfully mount the OSTs. However, running e2fsck may not be necessary as on one occassion, it was able to recovery after a second attempt.&lt;/p&gt;</description>
                <environment></environment>
        <key id="38558">LU-8463</key>
            <summary>OSSes drop into KDB during recovery</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="hyeung">Herbert Yeung</reporter>
                        <labels>
                    </labels>
                <created>Tue, 2 Aug 2016 01:15:08 +0000</created>
                <updated>Thu, 8 Sep 2016 18:54:19 +0000</updated>
                            <resolved>Fri, 2 Sep 2016 02:51:44 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="160492" author="hyeung" created="Tue, 2 Aug 2016 01:40:21 +0000"  >&lt;p&gt;Accidentally hit CTRL + enter, submitting the bug before inputting data, so here it is:&lt;/p&gt;

&lt;p&gt;During the crashes reported in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8462&quot; title=&quot;OSS keeps dropping into KDB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8462&quot;&gt;&lt;del&gt;LU-8462&lt;/del&gt;&lt;/a&gt;, several times upon recovery, the OSS will drop into KDB.  Here&apos;s a expanded timeline:&lt;/p&gt;

&lt;p&gt;Tues Jul 26 - 394 - 2028        KDB (I/O)&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;395 - 2000        KDB (I/O), (disk) crash during recovery, until e2fsck&lt;br/&gt;
Fri jUl 29 - s393 - 0910        KDB (aborting journal)&lt;br/&gt;
Fri Jul 29 - s394 - 1922        KDB (disk), (disk) crash druing recovery, e2fcsk&lt;br/&gt;
Fri Jul 29 - s393 - 2123        KDB (disk), (disk) crash during recovery, no e2fsck&lt;br/&gt;
Sun Jul 31 - s393 - 1519        KDB (disk)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;After the initial crash during recovery on Tuesday, I ran e2fsck on the node and was able to successfully mount the OSTs.  However, running e2fsck may not be necessary as on one occassion, it was able to recovery after a second attempt.&lt;/p&gt;

&lt;p&gt;Also, there appears to be a discrepancy in the device that the journal aborts on compared to where the LIDSKFS-fs error is reported on.  Is this normal?&lt;br/&gt;
s395 Tues Jul 26 - 2000:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LDISKFS-fs error (device dm-15): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 3037corrupted: 31949 blocks free in bitmap, 14347 - in gd

Aborting journal on device dm-6.
Kernel panic - not syncing: LDISKFS-fs (device dm-15): panic forced after error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Below is the output from the crashes during recovery/e2fsck output:&lt;/p&gt;

&lt;p&gt;s395  Tues Jul 26 - 2000 crash during recovery:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: nbp6-OST00b7: deleting orphan objects from 0x0:549495 to 0x0:549665
Lustre: nbp6-OST00bb: deleting orphan objects from 0x0:549224 to 0x0:549281
Lustre: nbp6-OST00cf: Recovery over after 6:43, of 12593 clients 10903 recovered and 1690 were evicted.
Lustre: nbp6-OST00cf: deleting orphan objects from 0x0:550044 to 0x0:550113
LDISKFS-fs error (device dm-15): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 3037corrupted: 31949 blocks free in bitmap, 14347 - in gd

Aborting journal on device dm-6.
Kernel panic - not syncing: LDISKFS-fs (device dm-15): panic forced after error

Pid: 17694, comm: ll_ost_ i o1052 o_u0t0 6o fN o1t6  tcapiunst eidn  2k.6d.b3,2 -w5a0i4t.i3n0g. 3f.oerl t6h.e2 0r1e5s1t0,0 8t.ixm8e6o_u6t4 .iluns t1r0e 2s5e3c o#n1d(
s)
Call Trace:
 [&amp;lt;ffffffff81564fb9&amp;gt;] ? panic+0xa7/0x190
 [&amp;lt;ffffffffa0eb9118&amp;gt;] ? ldiskfs_commit_super+0x188/0x210 [ldiskfs]
 [&amp;lt;ffffffffa0eb9724&amp;gt;] ? ldiskfs_handle_error+0xc4/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa0eb9ac2&amp;gt;] ? __ldiskfs_error+0x82/0x90 [ldiskfs]
 [&amp;lt;ffffffff810f2059&amp;gt;] ? delayacct_end+0x89/0xa0
 [&amp;lt;ffffffffa0e9ba29&amp;gt;] ? ldiskfs_mb_check_ondisk_bitmap+0x149/0x150 [ldiskfs]
 [&amp;lt;ffffffffa0e9baad&amp;gt;] ? ldiskfs_mb_generate_from_pa+0x7d/0x180 [ldiskfs]
 [&amp;lt;ffffffff8109e1a0&amp;gt;] ? wake_bit_function+0x0/0x50
 [&amp;lt;ffffffffa0e9e28b&amp;gt;] ? ldiskfs_mb_init_cache+0x55b/0xa30 [ldiskfs]
 [&amp;lt;ffffffffa0e9e87e&amp;gt;] ? ldiskfs_mb_init_group+0x11e/0x210 [ldiskfs]
 [&amp;lt;ffffffffa0e9edd5&amp;gt;] ? ldiskfs_mb_load_buddy+0x355/0x390 [ldiskfs]
 [&amp;lt;ffffffffa0e8500b&amp;gt;] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
 [&amp;lt;ffffffffa0e9fb7d&amp;gt;] ? ldiskfs_mb_find_by_goal+0x6d/0x2e0 [ldiskfs]
 [&amp;lt;ffffffffa0ea0019&amp;gt;] ? ldiskfs_mb_regular_allocator+0x59/0x410 [ldiskfs]
 [&amp;lt;ffffffffa0eba2e8&amp;gt;] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
 [&amp;lt;ffffffffa0e9a872&amp;gt;] ? ldiskfs_mb_normalize_request+0x2c2/0x3d0 [ldiskfs]
 [&amp;lt;ffffffffa0ea223d&amp;gt;] ? ldiskfs_mb_new_blocks+0x47d/0x630 [ldiskfs]
 [&amp;lt;ffffffffa0eba390&amp;gt;] ? ldiskfs_journal_start_sb+0x70/0x170 [ldiskfs]
 [&amp;lt;ffffffffa01f19ff&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x57f/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0e87cd2&amp;gt;] ? ldiskfs_ext_walk_space+0x142/0x310 [ldiskfs]
 [&amp;lt;ffffffffa01f1480&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x0/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa01f11cc&amp;gt;] ? fsfilt_map_nblocks+0xcc/0xf0 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa01f12f0&amp;gt;] ? fsfilt_ldiskfs_map_ext_inode_pages+0x100/0x200 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa01f1475&amp;gt;] ? fsfilt_ldiskfs_map_inode_pages+0x85/0x90 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0eba884&amp;gt;] ? ldiskfs_dquot_initialize+0x94/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa1422342&amp;gt;] ? osd_write_commit+0x302/0x620 [osd_ldiskfs]
 [&amp;lt;ffffffffa17cb094&amp;gt;] ? ofd_commitrw_write+0x684/0x11b0 [ofd]
 [&amp;lt;ffffffffa17cddfd&amp;gt;] ? ofd_commitrw+0x5cd/0xbb0 [ofd]
 [&amp;lt;ffffffffa0374861&amp;gt;] ? lprocfs_counter_add+0x151/0x1d6 [lvfs]
 [&amp;lt;ffffffffa176219d&amp;gt;] ? obd_commitrw+0x11d/0x390 [ost]
 [&amp;lt;ffffffffa176bf71&amp;gt;] ? ost_brw_write+0xea1/0x15d0 [ost]
 [&amp;lt;ffffffff812b9076&amp;gt;] ? vsnprintf+0x336/0x5e0
 [&amp;lt;ffffffffa068f500&amp;gt;] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
 [&amp;lt;ffffffffa17725cc&amp;gt;] ? ost_handle+0x439c/0x44d0 [ost]
 [&amp;lt;ffffffffa0401a44&amp;gt;] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [&amp;lt;ffffffffa06df0c5&amp;gt;] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [&amp;lt;ffffffffa04078d5&amp;gt;] ? lc_watchdog_touch+0x65/0x170 [libcfs]
 [&amp;lt;ffffffffa06d7a69&amp;gt;] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [&amp;lt;ffffffffa06e189d&amp;gt;] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
 [&amp;lt;ffffffff8100c28a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffffa06e0da0&amp;gt;] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
.All cpus are now in kdb

Entering kdb (current=0xffff881d9d9faab0, pid 17694) on processor 8 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[8]kdb&amp;gt; [-- Tue Jul 26 18:40:11 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s395  Tues Jul 26 - 2000 2nd crash during recovery:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: nbp6-OST00cf: Recovery over after 12:41, of 10903 clients 4619 recovered and 6284 were evicted.
Lustre: Skipped 1 previous similar message
LDISKFS-fs error (device dm-27): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 3037corrupted: 31949 blocks free in bitmap, 14347 - in gd

Aborting journal on device dm-6.
Kernel panic - not syncing: LDISKFS-fs (device dm-27): panic forced after error

Pid: 16034, comm: ll_ost_io02_002 Not tainted 2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
Call Trace:
 [&amp;lt;ffffffff81564fb9&amp;gt;] ? panic+0xa7/0x190
 [&amp;lt;ffffffffa0b74118&amp;gt;] ? ldiskfs_commit_super+0x188/0x210 [ldiskfs]
 [&amp;lt;ffffffffa0b74724&amp;gt;] ? ldiskfs_handle_error+0xc4/0xd0 [ldiskfs]

Entering kdb (current=0xffff881e5493cab0, pid 16034) on processor 8 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[8]kdb&amp;gt; [-- Tue Jul 26 19:16:06 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s395  Tues Jul 26 - 2000 e2fsck:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;nbp6-oss16 ~ # /usr/local/bin/fscklustre.sh
nbp6-OST00CF: nbp6-OST00cf contains a file system with errors, check forced.
nbp6-OST00EF: nbp6-OST00ef: clean, 83709/7397376 files, 1005802096/1893728256 blocks
nbp6-OST00DF: nbp6-OST00df: clean, 83601/7397376 files, 1000039730/1893728256 blocks
nbp6-OST00C3: nbp6-OST00c3: clean, 83857/7397376 files, 987218431/1893728256 blocks
nbp6-OST00DB: nbp6-OST00db: clean, 83949/7397376 files, 992486492/1893728256 blocks
nbp6-OST00E3: nbp6-OST00e3: clean, 83847/7397376 files, 993140941/1893728256 blocks
nbp6-OST00BF: nbp6-OST00bf: clean, 84168/7397376 files, 993181370/1893728256 blocks
nbp6-OST00E7: nbp6-OST00e7: clean, 84352/7397376 files, 997493607/1893728256 blocks
nbp6-OST00D3: nbp6-OST00d3: clean, 83798/7397376 files, 1000758166/1893728256 blocks
nbp6-OST00D7: nbp6-OST00d7: clean, 84268/7397376 files, 994045438/1893728256 blocks
nbp6-OST00EB: nbp6-OST00eb: clean, 84081/7397376 files, 1006478293/1893728256 blocks
nbp6-OST00CB: nbp6-OST00cb: clean, 83367/7397376 files, 992639326/1893728256 blocks
nbp6-OST00C7: nbp6-OST00c7: clean, 84000/7397376 files, 998232233/1893728256 blocks
nbp6-OST00BB: nbp6-OST00bb: clean, 82971/7397376 files, 1018277281/1893728256 blocks
nbp6-OST00B7: nbp6-OST00b7: clean, 83727/7397376 files, 1006767654/1893728256 blocks
nbp6-OST00CF: nbp6-OST00cf: Inode 172464, i_blocks is 53305288, should be 53117200.  FIXED.
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 0:actual (15872000, 301) != expected (36864, 4)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 9225:actual (10485760, 1) != expected (0, 0)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 11913:actual (40223547392, 8) != expected (13027540992, 7)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 11632:actual (140062720, 17) != expected (140058624, 16)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 310286060:actual (7704576, 15) != expected (7696384, 15)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 11968:actual (185972432896, 5936) != expected (146873962496, 4667)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 12031:actual (2330624, 35) != expected (552960, 7)
nbp6-OST00CF: nbp6-OST00cf: Update quota info for quota type 0.
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 0:actual (15872000, 301) != expected (36864, 4)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 41007:actual (3909255766016, 78036) != expected (1892489019392, 38304)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 1125:actual (10485760, 1) != expected (0, 0)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 41557:actual (179625074688, 5824) != expected (156474228736, 5039)
nbp6-OST00CF: [QUOTA WARNING] Usage inconsistent for ID 41020:actual (24576, 2) != expected (16384, 2)
nbp6-OST00CF: nbp6-OST00cf: Update quota info for quota type 1.
nbp6-OST00CF: nbp6-OST00cf: 84189/7397376 files (40.0% non-contiguous), 998870723/1893728256 blocks
ldev: Fatal: parallel command execution failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s394 Fri Jul 29 - 1922 crash during recovery:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LDISKFS-fs error (device dm-27): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 4891corrupted: 21438 blocks free in bitmap, 25535 - in gd

Aborting journal on device dm-2.
LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb:
Kernel panic - not syncing: LDISKFS-fs (device dm-27): panic forced after error

Pid: 95204, comm: ll_ost_io01_021 Tainted: G        W  ---------------    2.6.32-504.30.3.el6.20151008.x86_64.lustre253 #1
Call Trace:
 [&amp;lt;ffffffff81564fb9&amp;gt;] ? panic+0xa7/0x190
 [&amp;lt;ffffffffa0eb0118&amp;gt;] ? ldiskfs_commit_super+0x188/0x210 [ldiskf s ]14
 out of 16 cpus in kdb, waiting for the rest, timeout in 10 second(s)
 [&amp;lt;ffffffffa0eb0724&amp;gt;] ? ldiskfs_handle_error+0xc4/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa0eb0ac2&amp;gt;] ? __ldiskfs_error+0x82/0x90 [ldiskfs]
 [&amp;lt;ffffffff810f2059&amp;gt;] ? delayacct_end+0x89/0xa0
 [&amp;lt;ffffffffa0e92a29&amp;gt;] ? ldiskfs_mb_check_ondisk_bitmap+0x149/0x150 [ldiskfs]
 [&amp;lt;ffffffffa0e92aad&amp;gt;] ? ldiskfs_mb_generate_from_pa+0x7d/0x180 [ldiskfs]
 [&amp;lt;ffffffff8109e1a0&amp;gt;] ? wake_bit_function+0x0/0x50
 [&amp;lt;ffffffffa0e9528b&amp;gt;] ? ldiskfs_mb_init_cache+0x55b/0xa30 [ldiskfs]
 [&amp;lt;ffffffffa0e9587e&amp;gt;] ? ldiskfs_mb_init_group+0x11e/0x210 [ldiskfs]
 [&amp;lt;ffffffffa0e95dd5&amp;gt;] ? ldiskfs_mb_load_buddy+0x355/0x390 [ldiskfs]
 [&amp;lt;ffffffffa0e7c00b&amp;gt;] ? __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
 [&amp;lt;ffffffffa0e96b7d&amp;gt;] ? ldiskfs_mb_find_by_goal+0x6d/0x2e0 [ldiskfs]
 [&amp;lt;ffffffffa0e97019&amp;gt;] ? ldiskfs_mb_regular_allocator+0x59/0x410 [ldiskfs]
 [&amp;lt;ffffffffa0eb12e8&amp;gt;] ? __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
 [&amp;lt;ffffffffa0e91872&amp;gt;] ? ldiskfs_mb_normalize_request+0x2c2/0x3d0 [ldiskfs]
 [&amp;lt;ffffffffa0e9923d&amp;gt;] ? ldiskfs_mb_new_blocks+0x47d/0x630 [ldiskfs]
 [&amp;lt;ffffffffa0eb1390&amp;gt;] ? ldiskfs_journal_start_sb+0x70/0x170 [ldiskfs]
 [&amp;lt;ffffffffa02369ff&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x57f/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0e7ecd2&amp;gt;] ? ldiskfs_ext_walk_space+0x142/0x310 [ldiskfs]
 [&amp;lt;ffffffffa0236480&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x0/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa02361cc&amp;gt;] ? fsfilt_map_nblocks+0xcc/0xf0 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa02362f0&amp;gt;] ? fsfilt_ldiskfs_map_ext_inode_pages+0x100/0x200 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0236475&amp;gt;] ? fsfilt_ldiskfs_map_inode_pages+0x85/0x90 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa0eb1884&amp;gt;] ? ldiskfs_dquot_initialize+0x94/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa0f89342&amp;gt;] ? osd_write_commit+0x302/0x620 [osd_ldiskfs]
 [&amp;lt;ffffffffa137c094&amp;gt;] ? ofd_commitrw_write+0x684/0x11b0 [ofd]
 [&amp;lt;ffffffffa137edfd&amp;gt;] ? ofd_commitrw+0x5cd/0xbb0 [ofd]
 [&amp;lt;ffffffffa0372861&amp;gt;] ? lprocfs_counter_add+0x151/0x1d6 [lvfs]
 [&amp;lt;ffffffffa131319d&amp;gt;] ? obd_commitrw+0x11d/0x390 [ost]
 [&amp;lt;ffffffffa131cf71&amp;gt;] ? ost_brw_write+0xea1/0x15d0 [ost]
 [&amp;lt;ffffffff812b9076&amp;gt;] ? vsnprintf+0x336/0x5e0
 [&amp;lt;ffffffffa0686500&amp;gt;] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
 [&amp;lt;ffffffffa13235cc&amp;gt;] ? ost_handle+0x439c/0x44d0 [ost]
 [&amp;lt;ffffffffa03f8a44&amp;gt;] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [&amp;lt;ffffffffa06d60c5&amp;gt;] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [&amp;lt;ffffffffa03fe8d5&amp;gt;] ? lc_watchdog_touch+0x65/0x170 [libcfs]
 [&amp;lt;ffffffffa06cea69&amp;gt;] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [&amp;lt;ffffffffa06d889d&amp;gt;] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
 [&amp;lt;ffffffff8100c28a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffffa06d7da0&amp;gt;] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb: Detected aborted journal [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20

.  15 out of 16 cpus in kdb, waiting for the rest, timeout in 9 second(s)
..1 cpu is not in kdb, its state is unknown

Entering kdb (current=0xffff880e989cc040, pid 95204) on processor 7 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[7]kdb&amp;gt; [-- MARK -- Fri Jul 29 19:00:00 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s394 Fri Jul 29 - 1922 e2fsck:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;nbp6-oss15 ~ # /usr/local/bin/ldev -c /usr/local/etc/ldev-nbp6.conf /sbin/e2fsck -p %d |tee /root/e2fsck.160729
nbp6-OST00C2: nbp6-OST00c2: clean, 64886/7397376 files, 823185966/1893728256 blocks 
nbp6-OST00E2: nbp6-OST00e2: clean, 64936/7397376 files, 763956787/1893728256 blocks
nbp6-OST00DE: nbp6-OST00de: clean, 64459/7397376 files, 766022848/1893728256 blocks
nbp6-OST00E6: nbp6-OST00e6: clean, 64496/7397376 files, 766103433/1893728256 blocks
nbp6-OST00CE: nbp6-OST00ce: clean, 65124/7397376 files, 756735149/1893728256 blocks
nbp6-OST00DA: nbp6-OST00da: clean, 65162/7397376 files, 758145783/1893728256 blocks
nbp6-OST00EE: nbp6-OST00ee: clean, 65088/7397376 files, 759562710/1893728256 blocks
nbp6-OST00BE: nbp6-OST00be: clean, 64565/7397376 files, 764337531/1893728256 blocks
nbp6-OST00B6: nbp6-OST00b6: clean, 64453/7397376 files, 771893757/1893728256 blocks
nbp6-OST00EA: nbp6-OST00ea: clean, 64333/7397376 files, 772395615/1893728256 blocks
nbp6-OST00C6: nbp6-OST00c6: clean, 64955/7397376 files, 759923311/1893728256 blocks
nbp6-OST00BA: nbp6-OST00ba: clean, 64945/7397376 files, 757944974/1893728256 blocks
nbp6-OST00CA: nbp6-OST00ca: clean, 64103/7397376 files, 777560549/1893728256 blocks
nbp6-OST00D2: nbp6-OST00d2 contains a file system with errors, check forced.
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 0:actual (15863808, 319) != expected (36864, 4)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 30009:actual (2871015219200, 58476) != expected (2656438280192, 54656)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 9225:actual (10485760, 1) != expected (0, 0)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 11913:actual (50065829888, 11) != expected (22638972928, 9)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 11968:actual (166439620608, 5272) != expected (114509180928, 3829)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 310286060:actual (7212900352, 10) != expected (21749760, 11)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 12031:actual (2412544, 30) != expected (638976, 8)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 11810:actual (10202492928, 6) != expected (10199027712, 5)
nbp6-OST00D2: nbp6-OST00d2: Update quota info for quota type 0. 
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 0:actual (15863808, 319) != expected (36864, 4)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 41007:actual (3001073184768, 59307) != expected (1945474367488, 38766)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 1125:actual (10485760, 1) != expected (0, 0)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 41557:actual (156396912640, 5146) != expected (104593395712, 3727)
nbp6-OST00D2: [QUOTA WARNING] Usage inconsistent for ID 41020:actual (7191187456, 1) != expected (0, 0)
nbp6-OST00D2: nbp6-OST00d2: Update quota info for quota type 1.
nbp6-OST00D2: nbp6-OST00d2: 64800/7397376 files (38.0% non-contiguous), 773234595/1893728256 blocks
nbp6-OST00D6: nbp6-OST00d6 contains a file system with errors, check forced.
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 0:actual (15863808, 312) != expected (36864, 4)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 30009:actual (2823348002816, 58633) != expected (3762248306688, 77514)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 11632:actual (222023680, 21) != expected (222011392, 18)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 11968:actual (169801756672, 5385) != expected (115151048704, 3902)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 12031:actual (3214045184, 46) != expected (503808, 5)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 11810:actual (3606089728, 6) != expected (3605233664, 5)
nbp6-OST00D6: nbp6-OST00d6: Update quota info for quota type 0.
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 0:actual (15863808, 312) != expected (36864, 4)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 41007:actual (2898631974912, 59507) != expected (3833795325952, 78302)
nbp6-OST00D6: [QUOTA WARNING] Usage inconsistent for ID 41557:actual (156144955392, 5256) != expected (101637935104, 3800)
nbp6-OST00D6: nbp6-OST00d6: Update quota info for quota type 1.
nbp6-OST00D6: nbp6-OST00d6: 65098/7397376 files (36.9% non-contiguous), 746402105/1893728256 blocks
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;s393 Fri Jul 29 - 2123 crash during recovery:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: dumping log to /tmp/lustre-log.1469847950.16311
LDISKFS-fs error (device dm-17): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 4891corrupted: 21438 blocks free in bitmap, 25535 - in gd

Aborting journal on device dm-2. 
LDISKFS-fs error (device dm-17): ldiskfs_journal_start_sb: Detected aborted journal
Kernel panic - not syncing: LDISKFS-fs panic from previous error

Pid: 23565, comm: ll_ost_io00_ 0 5125  oTauit ntofe d:1 6G c p  us    in   Wkd b -, -w--a-it-i--n-g- f--o-r- t-h  e   re2s.6t., 3t2-i5me0o4.u3t 0.i3n. 1e0l6 s.e20c1o5nd10(0s)8.x
86_64.lustre253 #1
Call Trace:
 [&amp;lt;ffffffff81564fb9&amp;gt;] ? panic+0xa7/0x190
 [&amp;lt;ffffffffa10c9be0&amp;gt;] ? ldiskfs_remount+0x0/0x590 [ldiskfs]
 [&amp;lt;ffffffff811c2130&amp;gt;] ? sync_buffer+0x0/0x50
 [&amp;lt;ffffffffa10ca3c8&amp;gt;] ? ldiskfs_journal_start_sb+0xa8/0x170 [ldiskfs]
 [&amp;lt;ffffffff81566a58&amp;gt;] ? out_of_line_wait_on_bit+0x78/0x90
 [&amp;lt;ffffffffa10962e0&amp;gt;] ? __ldiskfs_ext_check+0x1c0/0x240 [ldiskfs]
 [&amp;lt;ffffffff8109e1a0&amp;gt;] ? wake_bit_function+0x0/0x50
 [&amp;lt;ffffffffa01e362f&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x1af/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa1097570&amp;gt;] ? ldiskfs_ext_find_extent+0x2a0/0x360 [ldiskfs]
 [&amp;lt;ffffffffa1097cd2&amp;gt;] ? ldiskfs_ext_walk_space+0x142/0x310 [ldiskfs]
 [&amp;lt;ffffffffa01e3480&amp;gt;] ? ldiskfs_ext_new_extent_cb+0x0/0x6cc [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa01e31cc&amp;gt;] ? fsfilt_map_nblocks+0xcc/0xf0 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa01e32f0&amp;gt;] ? fsfilt_ldiskfs_map_ext_inode_pages+0x100/0x200 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa01e3475&amp;gt;] ? fsfilt_ldiskfs_map_inode_pages+0x85/0x90 [fsfilt_ldiskfs]
 [&amp;lt;ffffffffa10ca884&amp;gt;] ? ldiskfs_dquot_initialize+0x94/0xd0 [ldiskfs]
 [&amp;lt;ffffffffa15ae342&amp;gt;] ? osd_write_commit+0x302/0x620 [osd_ldiskfs]
 [&amp;lt;ffffffffa176f094&amp;gt;] ? ofd_commitrw_write+0x684/0x11b0 [ofd]
 [&amp;lt;ffffffffa1771dfd&amp;gt;] ? ofd_commitrw+0x5cd/0xbb0 [ofd]
 [&amp;lt;ffffffffa0374861&amp;gt;] ? lprocfs_counter_add+0x151/0x1d6 [lvfs]
 [&amp;lt;ffffffffa170619d&amp;gt;] ? obd_commitrw+0x11d/0x390 [ost]
 [&amp;lt;ffffffffa170ff71&amp;gt;] ? ost_brw_write+0xea1/0x15d0 [ost]
 [&amp;lt;ffffffff812b9076&amp;gt;] ? vsnprintf+0x336/0x5e0
 [&amp;lt;ffffffffa068f500&amp;gt;] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
 [&amp;lt;ffffffffa17165cc&amp;gt;] ? ost_handle+0x439c/0x44d0 [ost]
 [&amp;lt;ffffffffa0401a44&amp;gt;] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [&amp;lt;ffffffffa06df0c5&amp;gt;] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
 [&amp;lt;ffffffffa04078d5&amp;gt;] ? lc_watchdog_touch+0x65/0x170 [libcfs]
 [&amp;lt;ffffffffa06d7a69&amp;gt;] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
 [&amp;lt;ffffffffa06e189d&amp;gt;] ? ptlrpc_main+0xafd/0x1780 [ptlrpc]
 [&amp;lt;ffffffff8100c28a&amp;gt;] ? child_rip+0xa/0x20
 [&amp;lt;ffffffffa06e0da0&amp;gt;] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
 [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
.All cpus are now in kdb
 
Entering kdb (current=0xffff880b0c8d0ab0, pid 23565) on processor 1 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[1]kdb&amp;gt; [-- Fri Jul 29 20:07:23 2016]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="160567" author="adilger" created="Tue, 2 Aug 2016 17:33:12 +0000"  >&lt;p&gt;Until the IO errors in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8462&quot; title=&quot;OSS keeps dropping into KDB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8462&quot;&gt;&lt;del&gt;LU-8462&lt;/del&gt;&lt;/a&gt; are fixed, there appears to be ongoing corruption introduced in this filesystem.  Running e2fsck in such an environment will only cause additional problems because it is making bad decisions based on ongoing corruption to the filesystem.  The source of the corruption needs to be fixed first before e2fsck is run on the filesystem again.  Then it will be possible to address whatever problems remain.&lt;/p&gt;</comment>
                            <comment id="160615" author="hyeung" created="Tue, 2 Aug 2016 21:58:12 +0000"  >&lt;p&gt;As mentioned in the update to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8462&quot; title=&quot;OSS keeps dropping into KDB&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8462&quot;&gt;&lt;del&gt;LU-8462&lt;/del&gt;&lt;/a&gt;, I don&apos;t believe there are ongoing I/O errors.&lt;/p&gt;

&lt;p&gt;Also, in case it was missed, there appears to be a discrepancy in the device that the journal aborts on compared to where the LIDSKFS-fs error is reported on.  In the below example, it indicates that error occurs on device dm-15, yet it aborts the journal on dm-6.   Is this expected behavior, incorrect printing of the aborted journal device, or incorrect behavior?&lt;/p&gt;

&lt;p&gt;s395 Tues Jul 26 - 2000:&lt;/p&gt;

&lt;p&gt;LDISKFS-fs error (device dm-15): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 3037corrupted: 31949 blocks free in bitmap, 14347 - in gd&lt;/p&gt;

&lt;p&gt;Aborting journal on device dm-6.&lt;br/&gt;
Kernel panic - not syncing: LDISKFS-fs (device dm-15): panic forced after error&lt;/p&gt;
</comment>
                            <comment id="160821" author="pjones" created="Thu, 4 Aug 2016 17:07:47 +0000"  >&lt;p&gt;Fan Yong&lt;/p&gt;

&lt;p&gt;What do you advise here?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="160881" author="yong.fan" created="Fri, 5 Aug 2016 09:16:45 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Also, in case it was missed, there appears to be a discrepancy in the device that the journal aborts on compared to where the LIDSKFS-fs error is reported on. In the below example, it indicates that error occurs on device dm-15, yet it aborts the journal on dm-6. Is this expected behavior, incorrect printing of the aborted journal device, or incorrect behavior?&lt;br/&gt;
s395 Tues Jul 26 - 2000:&lt;br/&gt;
LDISKFS-fs error (device dm-15): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 3037corrupted: 31949 blocks free in bitmap, 14347 - in gd&lt;br/&gt;
Aborting journal on device dm-6.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This should be related with the using of external journal. Means the Lustre device was on the dm-15, its journal was on the dm-6. Please check for conformation.&lt;/p&gt;

&lt;p&gt;The panic is because there was bitmap corruption as to the journal was aborted, and you have set the system behaviour on journal abort as &quot;errors=panic&quot;, then the ldiskfs will auto panic on the event of journal abort. It you do not want panic, it can be changed as &quot;errors=remount-ro&quot; or &quot;errors=continue&quot;. But in your case, since some bitmap corrupted, making the system to run continuously may be dangerous.&lt;/p&gt;</comment>
                            <comment id="163092" author="yong.fan" created="Wed, 24 Aug 2016 21:50:36 +0000"  >&lt;p&gt;Any feedback? Thanks!&lt;/p&gt;</comment>
                            <comment id="164726" author="ndauchy" created="Fri, 2 Sep 2016 00:58:23 +0000"  >&lt;p&gt;We did run e2fsck on all the servers that use the same backend disk array it corrected some errors.&lt;/p&gt;

&lt;p&gt;This case can be closed.&lt;/p&gt;</comment>
                            <comment id="164754" author="yong.fan" created="Fri, 2 Sep 2016 02:51:44 +0000"  >&lt;p&gt;The issue has been resolved via e2fsck.&lt;/p&gt;</comment>
                            <comment id="165376" author="mhanafi" created="Thu, 8 Sep 2016 18:54:19 +0000"  >&lt;p&gt;Please close.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="38557">LU-8462</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyj8n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>