<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:02:27 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6696] ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5</title>
                <link>https://jira.whamcloud.com/browse/LU-6696</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 11-0: hw_nb-OST0016-osc-MDT0000: Communicating with 10.151.26.55@o2ib, operation ost_connect failed with -114.
LustreError: 6488:0:(llog_cat.c:866:llog_cat_init_and_process()) hw_nb-OST0024-osc-MDT0000: llog_process() with cat_cancel_cb failed: rc = -5
LustreError: 6580:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5
LustreError: 6580:0:(osp_sync.c:874:osp_sync_thread()) LBUG
Pid: 6580, comm: osp-syn-36-0

Call Trace:
 [&amp;lt;ffffffffa05cf895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa05cfe97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa10d9243&amp;gt;] osp_sync_thread+0x753/0x7d0 [osp]
 [&amp;lt;ffffffff81559b9e&amp;gt;] ? thread_return+0x4e/0x770
 [&amp;lt;ffffffffa10d8af0&amp;gt;] ? osp_sync_thread+0x0/0x7d0 [osp]

Entering kdb (current=0xffff8803b5e04080, pid 6580) on processor 3 Oops: (null)
due to oops @ 0x0
kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
[3]kdb&amp;gt; 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="30548">LU-6696</key>
            <summary>ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="mhanafi">Mahmoud Hanafi</reporter>
                        <labels>
                    </labels>
                <created>Mon, 8 Jun 2015 18:36:27 +0000</created>
                <updated>Wed, 18 Jul 2018 18:10:36 +0000</updated>
                            <resolved>Wed, 13 Jul 2016 18:08:22 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.9.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>12</watches>
                                                                            <comments>
                            <comment id="117808" author="pjones" created="Mon, 8 Jun 2015 22:42:32 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please advise here? Is this related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5056&quot; title=&quot;osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 6 changes, 8 in progress, 0 in flight: -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5056&quot;&gt;&lt;del&gt;LU-5056&lt;/del&gt;&lt;/a&gt; perhaps?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="117857" author="bobijam" created="Tue, 9 Jun 2015 01:35:05 +0000"  >&lt;p&gt;Can I have debug logs?&lt;/p&gt;</comment>
                            <comment id="117865" author="mhanafi" created="Tue, 9 Jun 2015 03:38:01 +0000"  >&lt;p&gt;the mds crashes so I can&apos;t give any debug logs.&lt;/p&gt;

&lt;p&gt;Do you want them from the oss? what specific debug settings?&lt;/p&gt;</comment>
                            <comment id="117866" author="bobijam" created="Tue, 9 Jun 2015 03:43:40 +0000"  >&lt;p&gt;I want to know the cause of the llog process failure, can you upload logs from OST0024?&lt;/p&gt;</comment>
                            <comment id="117954" author="mhanafi" created="Tue, 9 Jun 2015 18:13:49 +0000"  >&lt;p&gt;I uploaded logs (lbug.OSS1.out.gz) to ftp site: /uploads/LU6696/lbug.OSS1.out.gz&lt;/p&gt;
</comment>
                            <comment id="117998" author="bobijam" created="Wed, 10 Jun 2015 03:55:29 +0000"  >&lt;p&gt;the OSS shows that it cannot access MGS service &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;25823:10000000:00000001:4.0:1433873407.254887:0:81369:0:(mgc_request.c:1106:mgc_enqueue()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
25824:10000000:01000000:4.0:1433873407.254888:0:81369:0:(mgc_request.c:1881:mgc_process_log()) Can&apos;t get cfg lock: -5
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Are MDS and MGS different nodes, or MDT and MGT locate on different devices?  It seems that it&apos;s MGS does not response and the llog process cannot move on.&lt;/p&gt;

&lt;p&gt;Another thing to notice is that llog_cat_init_and_process() just discards the return value of llog_process_or_fork(), which makes the osp_sync_init() ignores the error in its osp_sync_llog_init(), I think this part also need fix.&lt;/p&gt;</comment>
                            <comment id="118000" author="mhanafi" created="Wed, 10 Jun 2015 04:46:52 +0000"  >&lt;p&gt;The MDS and MGS where located on the same device. We got the errors. as part of debugging I moved the mgs and mdt to different devices. Did a tunefs.lustre --writeconf but  got the same error.&lt;/p&gt;

&lt;p&gt;What is the fix for the osp_sync_llog_init()&lt;/p&gt;</comment>
                            <comment id="118001" author="mhanafi" created="Wed, 10 Jun 2015 05:05:36 +0000"  >&lt;p&gt;btw, it is only the single OST that is causing the LBUG.&lt;/p&gt;
</comment>
                            <comment id="118002" author="mhanafi" created="Wed, 10 Jun 2015 05:12:23 +0000"  >&lt;p&gt;I uploaded debug logs from mdt&lt;br/&gt;
ftp:/uploads/LU6696/lustre-log.1433912973.6315.txt&lt;/p&gt;
</comment>
                            <comment id="118007" author="bobijam" created="Wed, 10 Jun 2015 06:44:43 +0000"  >&lt;p&gt;from the MDT log&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000040:00000001:3.0:1433912973.463529:0:6221:0:(llog_osd.c:542:llog_osd_next_block()) Process entered
00000040:00001000:3.0:1433912973.463531:0:6221:0:(llog_osd.c:551:llog_osd_next_block()) looking for log index 61 (cur idx 0 off 8192)
00000040:00000001:1.0:1433912973.463674:0:6221:0:(llog_osd.c:652:llog_osd_next_block()) Process leaving via out (rc=0 : 0 : 0x0)
00000040:00000001:1.0:1433912973.463676:0:6221:0:(lustre_log.h:520:llog_next_block()) Process leaving (rc=0 : 0 : 0)
00000040:00001000:1.0:1433912973.463678:0:6221:0:(llog.c:336:llog_process_thread()) processing rec 0xffff88035dcde000 type 0x10645539
00000040:00001000:1.0:1433912973.463680:0:6221:0:(llog.c:342:llog_process_thread()) after swabbing, type=0x10645539 idx=0
00000040:00000001:1.0:1433912973.463682:0:6221:0:(llog.c:347:llog_process_thread()) Process leaving via repeat (rc=0 : 0 : 0x0)
00000040:00001000:1.0:1433912973.463685:0:6221:0:(llog.c:318:llog_process_thread()) index: 61 last_index 64767
00000040:00000001:1.0:1433912973.463687:0:6221:0:(lustre_log.h:510:llog_next_block()) Process entered
00000040:00000001:1.0:1433912973.463688:0:6221:0:(llog_osd.c:542:llog_osd_next_block()) Process entered
00000040:00001000:1.0:1433912973.463689:0:6221:0:(llog_osd.c:551:llog_osd_next_block()) looking for log index 61 (cur idx 63 off 12224)
00000040:00000001:1.0:1433912973.463692:0:6221:0:(llog_osd.c:654:llog_osd_next_block()) Process leaving via out (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
00000040:00000001:1.0:1433912973.463694:0:6221:0:(lustre_log.h:520:llog_next_block()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00000040:00000001:1.0:1433912973.463696:0:6221:0:(llog.c:326:llog_process_thread()) Process leaving via out (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
00000040:00000010:1.0:1433912973.463699:0:6221:0:(llog.c:402:llog_process_thread()) kfreed &apos;buf&apos;: 8192 at ffff88035dcde000.
00000040:00000010:1.0:1433912973.463701:0:6221:0:(llog.c:480:llog_process_or_fork()) kfreed &apos;lpi&apos;: 80 at ffff88035e1f51c0.
00000040:00000001:1.0:1433912973.463704:0:6221:0:(llog.c:481:llog_process_or_fork()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00000040:00020000:1.0:1433912973.463706:0:6221:0:(llog_cat.c:866:llog_cat_init_and_process()) hw_nb-OST0024-osc-MDT0000: llog_process() with cat_cancel_cb failed: rc = -5
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can see that the llog for this device (OST00024) is corrupted (looking for llog indexed 61, and skipped over to index 63, and find out that the reading offset (12224) is over the llog file size, and -EIO is returned.&lt;/p&gt;

&lt;p&gt;Tappro, how can I disable the llog process of this OST device? &lt;/p&gt;</comment>
                            <comment id="118031" author="tappro" created="Wed, 10 Jun 2015 14:00:53 +0000"  >&lt;p&gt;there is no way to stop llog processing, except the llog removal by hands. Meanwhile the record type 0x10645539 is llog_header and it is correct that its lrh_index is 0, so probably llog is not corrupted. The question is why header lies at offset 8192 or maybe block was read from offset 0 somehow? Do you know llog ID? Then it can be found in /O/ directory and analyzed, anyway we have to find it. Don&apos;t remove this llog, we need its header with all data in it at first.&lt;/p&gt;</comment>
                            <comment id="118035" author="bobijam" created="Wed, 10 Jun 2015 14:24:31 +0000"  >&lt;p&gt;00000004:00000040:1.0:1433912973.463121:0:6221:0:(osp_sync.c:949:osp_sync_llog_init()) hw_nb-OST0024-osc-MDT0000: Init llog for 36 - catid 0xcb2000d:1:9c396a65&lt;/p&gt;

&lt;p&gt;So the llog ID is 0xcb2000d:1:9c396a65 ?&lt;/p&gt;

&lt;p&gt;From code llog_process_thread(), the cur_offset is initialized as LLOG_CHUNK_SIZE, so the block was read from 8192, I don&apos;t quite know about llog somehow.&lt;/p&gt;</comment>
                            <comment id="118038" author="mhanafi" created="Wed, 10 Jun 2015 14:34:31 +0000"  >&lt;p&gt;How do I get the llog ID?&lt;/p&gt;</comment>
                            <comment id="118039" author="mhanafi" created="Wed, 10 Jun 2015 14:37:16 +0000"  >&lt;p&gt;nbphw-mds /mnt/lustre/hw_mdt/OBJECTS # llog_reader cb2000d:9c396a65&lt;br/&gt;
rec #0 type=10645539 len=8192&lt;br/&gt;
The log is corrupt (too big at 0)&lt;br/&gt;
Could not pack buffer; rc=-22&lt;/p&gt;
</comment>
                            <comment id="118101" author="adilger" created="Wed, 10 Jun 2015 18:40:25 +0000"  >&lt;p&gt;It should be possible to improve the error handling in this code so that it isn&apos;t an LASSERT(), and instead returns an error to the caller.  We shouldn&apos;t have LASSERT() checks on data that comes from the disk.&lt;/p&gt;</comment>
                            <comment id="118133" author="tappro" created="Wed, 10 Jun 2015 20:39:00 +0000"  >&lt;p&gt;Can you post that llog file here, please?&lt;/p&gt;</comment>
                            <comment id="118134" author="tappro" created="Wed, 10 Jun 2015 20:42:49 +0000"  >&lt;p&gt;It looks like llog has another (or the same) header written from 8192 offset. That is wrong and I&apos;d like to investigate this to understand how that was possible. &lt;/p&gt;

&lt;p&gt;Andreas, I agree, OSP code is quite aggressive towards possible IO errors&lt;/p&gt;</comment>
                            <comment id="118140" author="mhanafi" created="Wed, 10 Jun 2015 21:18:39 +0000"  >&lt;p&gt;attached llog file to the LU.&lt;/p&gt;</comment>
                            <comment id="118169" author="tappro" created="Thu, 11 Jun 2015 06:10:10 +0000"  >&lt;p&gt;&lt;del&gt;Mahmoud, the llog looks empty, can you upload it again and gzip it before, please?&lt;/del&gt;&lt;br/&gt;
That was my browser issue, false alarm. I have the file.&lt;/p&gt;</comment>
                            <comment id="118186" author="tappro" created="Thu, 11 Jun 2015 11:02:16 +0000"  >&lt;p&gt;well, llog is corrupted in some strange way, meanwhile I&apos;ve found that llog contained 4 records with indeces 61,62,63,64. Llog itself contains only 3 records 62, 63 and 64. And everything before those records are just garbage. I&apos;ve fixed llog manually so it looks healthy now and contains those three records:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;# lustre/utils/llog_reader cb2000d_9c396a65_fixed 
Bit 0 of 3 not set
rec #62 type=1064553b len=64
rec #63 type=1064553b len=64
rec #64 type=1064553b len=64
Header size : 8192
Time : Fri Nov  7 09:00:21 2008
&lt;span class=&quot;code-object&quot;&gt;Number&lt;/span&gt; of records: 3
Target uuid :  
-----------------------
#62 (064)ogen=0 name=0x3bf:1
#63 (064)ogen=0 name=0x419:1
#64 (064)ogen=0 name=0x448:1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That might help to revive MDS with access at least to those plain llogs.&lt;/p&gt;</comment>
                            <comment id="118297" author="jaylan" created="Thu, 11 Jun 2015 22:43:29 +0000"  >&lt;p&gt;My manager asks to raise the priority (currently 3) because the production filesystem is not available.&lt;/p&gt;</comment>
                            <comment id="118317" author="bobijam" created="Fri, 12 Jun 2015 01:46:57 +0000"  >&lt;p&gt;Mikhail, can you upload the llog?&lt;/p&gt;</comment>
                            <comment id="118330" author="tappro" created="Fri, 12 Jun 2015 08:25:55 +0000"  >&lt;p&gt;Upload fixed llog catalog file&lt;/p&gt;</comment>
                            <comment id="118335" author="gerrit" created="Fri, 12 Jun 2015 09:10:43 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/15245&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15245&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6696&quot; title=&quot;ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6696&quot;&gt;&lt;del&gt;LU-6696&lt;/del&gt;&lt;/a&gt; llog: tool to fix corrupted llog catalog&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 000ed45ddda77930f00cf3d2791e9af703aab95d&lt;/p&gt;</comment>
                            <comment id="118339" author="gerrit" created="Fri, 12 Jun 2015 10:16:31 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/15247&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15247&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6696&quot; title=&quot;ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6696&quot;&gt;&lt;del&gt;LU-6696&lt;/del&gt;&lt;/a&gt; llog: tool to fix corrupted llog catalog&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 289722425f1195bc2438cd65dd9bcfa0243fcf46&lt;/p&gt;</comment>
                            <comment id="118341" author="tappro" created="Fri, 12 Jun 2015 10:20:19 +0000"  >&lt;p&gt;I&apos;ve just pushed tool to fix corrupted llog catalog, it might be helpful in such cases to revive server quickly. Of course we can just destroy corrupted llog, but there will be orphaned plain llogs. So anyway this tool is helpful to show all valid records to find out IDs of plain llogs to delete.&lt;/p&gt;</comment>
                            <comment id="118474" author="mhanafi" created="Sun, 14 Jun 2015 05:36:13 +0000"  >&lt;p&gt;I fixed the corrupted llog catalog and was able to mount the filesystem. You may lower the prio.&lt;/p&gt;
</comment>
                            <comment id="118484" author="pjones" created="Sun, 14 Jun 2015 12:47:30 +0000"  >&lt;p&gt;ok - thanks Mahmoud&lt;/p&gt;</comment>
                            <comment id="118491" author="adilger" created="Sun, 14 Jun 2015 16:43:09 +0000"  >&lt;p&gt;Mike, thanks for getting this tool working for the customer so quickly. &lt;/p&gt;

&lt;p&gt;It would be more useful in the long run if the kernel llog code would just skip the corrupted records itself, and/or have LFSCK repair them before use. That allows the filesystem to keep working, rather than taking an outage and requiring the admin to even figure out such a tool exists and have to run it, instead of the kernel dealing with this problem directly. &lt;/p&gt;</comment>
                            <comment id="127119" author="sarah" created="Fri, 11 Sep 2015 18:23:48 +0000"  >&lt;p&gt;Hit this bug on master branch,  replay-single test_60 failed.  lustre-master build# 3175 RHEL7 DNE&lt;/p&gt;


&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_logs/d35a0490-54ed-11e5-9cd2-5254006e85c2/show_text&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_logs/d35a0490-54ed-11e5-9cd2-5254006e85c2/show_text&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;llog unlink ================================ 15:49:47 \(1441468187\)
15:51:00:[14402.025502] Lustre: DEBUG MARKER: == replay-single test 60: test llog post recovery init vs llog unlink ================================ 15:49:47 (1441468187)
15:51:00:[14402.616182] Lustre: DEBUG MARKER: sync; sync; sync
15:51:00:[14403.619443] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
15:51:00:[14403.864368] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
15:51:00:[14403.990255] Turning device dm-0 (0xfc00000) read-only
15:51:00:[14404.116309] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
15:51:00:[14404.237332] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
15:51:00:[14404.649105] Lustre: DEBUG MARKER: grep -c /mnt/mds1&apos; &apos; /proc/mounts
15:51:00:[14404.886835] Lustre: DEBUG MARKER: umount -d /mnt/mds1
15:51:00:[14411.183415] Removing read-only on unknown block (0xfc00000)
15:51:00:[14411.343626] Lustre: DEBUG MARKER: lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
15:51:00:[14421.656194] Lustre: DEBUG MARKER: hostname
15:51:00:[14421.954993] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
15:51:00:[14422.192567] Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
15:51:00:[14422.457907] LDISKFS-fs (dm-0): recovery complete
15:51:00:[14422.469317] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
15:51:00:[14422.622381] Lustre: lustre-MDT0000-o: trigger OI scrub by RPC for [0x1:0x21a:0x0], rc = 0 [1]
15:51:00:[14422.623847] LustreError: 22791:0:(llog_cat.c:171:llog_cat_id2handle()) lustre-OST0000-osc-MDT0000: error opening log id 0x21a:1:0: rc = -115
15:51:00:[14422.625103] LustreError: 22791:0:(llog_cat.c:545:llog_cat_process_cb()) lustre-OST0000-osc-MDT0000: cannot find handle for llog 0x21a:1: -115
15:51:00:[14422.626273] LustreError: 22791:0:(osp_sync.c:1132:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 1 in progress, 0 in flight: -115
15:51:00:[14422.627634] LustreError: 22791:0:(osp_sync.c:1132:osp_sync_thread()) LBUG
15:51:00:[14422.628266] Pid: 22791, comm: osp-syn-0-0
15:51:00:[14422.628640] 
15:51:00:[14422.628640] Call Trace:
15:51:00:[14422.629041]  [&amp;lt;ffffffffa06257d3&amp;gt;] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
15:51:00:[14422.629675]  [&amp;lt;ffffffffa0625d75&amp;gt;] lbug_with_loc+0x45/0xc0 [libcfs]
15:51:00:[14422.630265]  [&amp;lt;ffffffffa0f532ea&amp;gt;] osp_sync_thread+0x7fa/0x8f0 [osp]
15:51:00:[14422.630848]  [&amp;lt;ffffffff810125f6&amp;gt;] ? __switch_to+0x136/0x4a0
15:51:00:[14422.631365]  [&amp;lt;ffffffffa0f52af0&amp;gt;] ? osp_sync_thread+0x0/0x8f0 [osp]
15:51:00:[14422.631941]  [&amp;lt;ffffffff8109739f&amp;gt;] kthread+0xcf/0xe0
15:51:00:[14422.632393]  [&amp;lt;ffffffff810972d0&amp;gt;] ? kthread+0x0/0xe0
15:51:00:[14422.632866]  [&amp;lt;ffffffff81615018&amp;gt;] ret_from_fork+0x58/0x90
15:51:00:[14422.633368]  [&amp;lt;ffffffff810972d0&amp;gt;] ? kthread+0x0/0xe0
15:51:00:[14422.633818] 
15:51:00:[14422.635637] Kernel panic - not syncing: LBUG
15:51:00:[14422.636020] CPU: 0 PID: 22791 Comm: osp-syn-0-0 Tainted: GF          O--------------   3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1
15:51:00:[14422.636020] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
15:51:00:[14422.636020]  ffffffffa0642ecf 0000000071490b94 ffff88006e21fd28 ffffffff816051aa
15:51:00:[14422.636020]  ffff88006e21fda8 ffffffff815fea1e ffffffff00000008 ffff88006e21fdb8
15:51:00:[14422.636020]  ffff88006e21fd58 0000000071490b94 ffffffffa0f63e60 0000000000000246
15:51:00:[14422.636020] Call Trace:
15:51:00:[14422.636020]  [&amp;lt;ffffffff816051aa&amp;gt;] dump_stack+0x19/0x1b
15:51:00:[14422.636020]  [&amp;lt;ffffffff815fea1e&amp;gt;] panic+0xd8/0x1e7
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0625ddb&amp;gt;] lbug_with_loc+0xab/0xc0 [libcfs]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0f532ea&amp;gt;] osp_sync_thread+0x7fa/0x8f0 [osp]
15:51:00:[14422.636020]  [&amp;lt;ffffffff810125f6&amp;gt;] ? __switch_to+0x136/0x4a0
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0f52af0&amp;gt;] ? osp_sync_process_queues+0x1660/0x1660 [osp]
15:51:00:[14422.636020]  [&amp;lt;ffffffff8109739f&amp;gt;] kthread+0xcf/0xe0
15:51:00:[14422.636020]  [&amp;lt;ffffffff810972d0&amp;gt;] ? kthread_create_on_node+0x140/0x140
15:51:00:[14422.636020]  [&amp;lt;ffffffff81615018&amp;gt;] ret_from_fork+0x58/0x90
15:51:00:[14422.636020]  [&amp;lt;ffffffff810972d0&amp;gt;] ? kthread_create_on_node+0x140/0x140
15:51:00:[14422.636020] drm_kms_helper: panic occurred, switching back to text console
15:51:00:[14422.636020] ------------[ cut here ]------------
15:51:00:[14422.636020] kernel BUG at arch/x86/mm/pageattr.c:216!
15:51:00:[14422.636020] invalid opcode: 0000 [#1] SMP 
15:51:00:[14422.636020] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw virtio_balloon parport_pc i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper virtio_blk ttm 8139too drm ata_piix 8139cp mii virtio_pci virtio_ring virtio i2c_core libata floppy
15:51:00:[14422.636020] CPU: 0 PID: 22791 Comm: osp-syn-0-0 Tainted: GF          O--------------   3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1
15:51:00:[14422.636020] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
15:51:00:[14422.636020] task: ffff88007acb6660 ti: ffff88006e21c000 task.ti: ffff88006e21c000
15:51:00:[14422.636020] RIP: 0010:[&amp;lt;ffffffff8105c2ef&amp;gt;]  [&amp;lt;ffffffff8105c2ef&amp;gt;] change_page_attr_set_clr+0x4ef/0x500
15:51:00:[14422.636020] RSP: 0018:ffff88006e21f530  EFLAGS: 00010046
15:51:00:[14422.636020] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000010
15:51:00:[14422.636020] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000080000000
15:51:00:[14422.636020] RBP: ffff88006e21f5c8 R08: 0000000000000004 R09: 000000000006dcf8
15:51:00:[14422.636020] R10: 0000000000003689 R11: ffffffff8118ff6f R12: 0000000000000010
15:51:00:[14422.636020] R13: 0000000000000000 R14: 0000000000000200 R15: 0000000000000005
15:51:00:[14422.636020] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
15:51:00:[14422.636020] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
15:51:00:[14422.636020] CR2: 00007fb7152a8220 CR3: 000000000190e000 CR4: 00000000000006f0
15:51:00:[14422.636020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
15:51:00:[14422.636020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
15:51:00:[14422.636020] Stack:
15:51:00:[14422.636020]  00000004b3216d38 ffff880000000000 0000000000000000 ffff88006b62c000
15:51:00:[14422.636020]  ffff88007acb6660 0000000000000000 0000000000000000 0000000000000010
15:51:00:[14422.636020]  0000000000000000 0000000500000001 000000000006dcf8 0000020000000000
15:51:00:[14422.636020] Call Trace:
15:51:00:[14422.636020]  [&amp;lt;ffffffff8105c646&amp;gt;] _set_pages_array+0xe6/0x130
15:51:00:[14422.636020]  [&amp;lt;ffffffff8105c6c3&amp;gt;] set_pages_array_wc+0x13/0x20
15:51:00:[14422.636020]  [&amp;lt;ffffffffa01133af&amp;gt;] ttm_set_pages_caching+0x2f/0x70 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa01134f4&amp;gt;] ttm_alloc_new_pages.isra.7+0xb4/0x180 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0113e50&amp;gt;] ttm_pool_populate+0x3e0/0x500 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa013132e&amp;gt;] cirrus_ttm_tt_populate+0xe/0x10 [cirrus]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa01106dd&amp;gt;] ttm_bo_move_memcpy+0x65d/0x6e0 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffff8118f73e&amp;gt;] ? map_vm_area+0x2e/0x40
15:51:00:[14422.636020]  [&amp;lt;ffffffffa010c2c9&amp;gt;] ? ttm_tt_init+0x69/0xb0 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa01312d8&amp;gt;] cirrus_bo_move+0x18/0x20 [cirrus]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa010dde5&amp;gt;] ttm_bo_handle_move_mem+0x265/0x5b0 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffff81601a64&amp;gt;] ? __slab_free+0x10e/0x277
15:51:00:[14422.636020]  [&amp;lt;ffffffffa010e74a&amp;gt;] ? ttm_bo_mem_space+0x10a/0x310 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa010ee17&amp;gt;] ttm_bo_validate+0x247/0x260 [ttm]
15:51:00:[14422.636020]  [&amp;lt;ffffffff81059e69&amp;gt;] ? iounmap+0x79/0xa0
15:51:00:[14422.636020]  [&amp;lt;ffffffff81050000&amp;gt;] ? kgdb_arch_late+0x80/0x180
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0131ac2&amp;gt;] cirrus_bo_push_sysram+0x82/0xe0 [cirrus]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa012fc84&amp;gt;] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0130479&amp;gt;] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa00ee939&amp;gt;] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa00ef6bf&amp;gt;] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa00af711&amp;gt;] drm_mode_set_config_internal+0x61/0xe0 [drm]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa00f6e83&amp;gt;] restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa00f7045&amp;gt;] drm_fb_helper_force_kernel_mode+0x75/0xb0 [drm_kms_helper]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa00f7d59&amp;gt;] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
15:51:00:[14422.636020]  [&amp;lt;ffffffff81610a6c&amp;gt;] notifier_call_chain+0x4c/0x70
15:51:00:[14422.636020]  [&amp;lt;ffffffff81610aca&amp;gt;] atomic_notifier_call_chain+0x1a/0x20
15:51:00:[14422.636020]  [&amp;lt;ffffffff815fea4c&amp;gt;] panic+0x106/0x1e7
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0625ddb&amp;gt;] lbug_with_loc+0xab/0xc0 [libcfs]
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0f532ea&amp;gt;] osp_sync_thread+0x7fa/0x8f0 [osp]
15:51:00:[14422.636020]  [&amp;lt;ffffffff810125f6&amp;gt;] ? __switch_to+0x136/0x4a0
15:51:00:[14422.636020]  [&amp;lt;ffffffffa0f52af0&amp;gt;] ? osp_sync_process_queues+0x1660/0x1660 [osp]
15:51:00:[14422.636020]  [&amp;lt;ffffffff8109739f&amp;gt;] kthread+0xcf/0xe0
15:51:00:[14422.636020]  [&amp;lt;ffffffff810972d0&amp;gt;] ? kthread_create_on_node+0x140/0x140
15:51:00:[14422.636020]  [&amp;lt;ffffffff81615018&amp;gt;] ret_from_fork+0x58/0x90
15:51:00:[14422.636020]  [&amp;lt;ffffffff810972d0&amp;gt;] ? kthread_create_on_node+0x140/0x140
16:50:26:********** Timeout by autotest system **********
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="127181" author="tappro" created="Sun, 13 Sep 2015 09:13:23 +0000"  >&lt;p&gt;I wonder about -115 (EINPROGRESS) error code and think it is from obd_fid_alloc() which may do RPC to the master MDT. While we need better error handling in OSP, in this particular case I think it is also not right to return -EINPROGRESS from the FID/SEQ code at all, it  should be handled inside.&lt;/p&gt;</comment>
                            <comment id="148499" author="mhanafi" created="Mon, 11 Apr 2016 20:38:18 +0000"  >&lt;p&gt;We can close this LU.&lt;/p&gt;</comment>
                            <comment id="148513" author="adilger" created="Mon, 11 Apr 2016 23:01:35 +0000"  >&lt;p&gt;It doesn&apos;t appear that either &lt;a href=&quot;http://review.whamcloud.com/15250&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15250&lt;/a&gt; or &lt;a href=&quot;http://review.whamcloud.com/15247&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15247&lt;/a&gt; have landed to master.  Are these patches no longer needed (and should be abandoned) because of different patches to master, or do they need to be ported to master?&lt;/p&gt;</comment>
                            <comment id="150471" author="tappro" created="Thu, 28 Apr 2016 14:39:17 +0000"  >&lt;p&gt;Andreas, the 15250 is not in master, so can be ported, 15247 has master patch which is 15245 and is tracked under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7011&quot; title=&quot;Kernel part of llog subsystem can do self-repairing in some cases&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7011&quot;&gt;LU-7011&lt;/a&gt;, it is not landed but will not be lost after the closing of this ticket.&lt;/p&gt;</comment>
                            <comment id="150475" author="gerrit" created="Thu, 28 Apr 2016 17:00:03 +0000"  >&lt;p&gt;Bobi Jam (bobijam@hotmail.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/19856&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19856&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6696&quot; title=&quot;ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6696&quot;&gt;&lt;del&gt;LU-6696&lt;/del&gt;&lt;/a&gt; llog: improve error handling&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 12153490536bb3f1049631720b3629de68ad8574&lt;/p&gt;</comment>
                            <comment id="158426" author="gerrit" created="Mon, 11 Jul 2016 23:57:07 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/19856/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19856/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6696&quot; title=&quot;ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6696&quot;&gt;&lt;del&gt;LU-6696&lt;/del&gt;&lt;/a&gt; llog: improve error handling&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 53d2f414d75ac1302b53017376ca2f1fda1f3d17&lt;/p&gt;</comment>
                            <comment id="158658" author="jgmitter" created="Wed, 13 Jul 2016 18:08:22 +0000"  >&lt;p&gt;Patch has landed to master for 2.9.0.&lt;/p&gt;

&lt;p&gt;The tool patch is being tracked by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7011&quot; title=&quot;Kernel part of llog subsystem can do self-repairing in some cases&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7011&quot;&gt;LU-7011&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="24700">LU-5056</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="43447">LU-9068</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="37490">LU-8252</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="31497">LU-7011</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="18124" name="cb2000d_9c396a65" size="12288" author="mhanafi" created="Wed, 10 Jun 2015 21:20:16 +0000"/>
                            <attachment id="18141" name="cb2000d_9c396a65_fixed" size="12288" author="tappro" created="Fri, 12 Jun 2015 08:25:55 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 29 Apr 2016 18:36:27 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxf67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 8 Jun 2015 18:36:27 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>