<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:31:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10066] A potential bug on OSP setattr handling</title>
                <link>https://jira.whamcloud.com/browse/LU-10066</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I discovered a potential bug about OSP when I was working on FLR. In order to make FLR work properly, the MDT sends layout version to OST objects by setattr RPC. Also the same layout version will be sent to client also and client will carry the layout version to make the BRW write RPC legal. The symptom is that after OSP writes the setattr record into llog, which should be picked by the corresponding osp-sync thread later but that was never happened.&lt;/p&gt;

&lt;p&gt;Attached please find the log and debug patch for this problem, but let me explain a little bit more for the log.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:80000000:9.0:1506717693.938247:0:11105:0:(lod_object.c:1152:lod_obj_stripe_attr_set_cb()) [0x100020000:0x1045:0x0]: set layout version: 630, comp_idx: 10
00000004:80000000:9.0:1506717693.938249:0:11105:0:(osp_sync.c:425:osp_sync_add_rec()) [0xd4:0x1:0x0]: set layout version to: 630, rc = 0
..
00000004:80000000:9.0:1506717693.938247:0:11105:0:(lod_object.c:1152:lod_obj_stripe_attr_set_cb()) [0x100020000:0x1045:0x0]: set layout version: 630, comp_idx: 10
00000004:80000000:9.0:1506717693.938249:0:11105:0:(osp_sync.c:425:osp_sync_add_rec()) [0xd4:0x1:0x0]: set layout version to: 630, rc = 0
..
00000004:80000000:9.0:1506717693.938310:0:11105:0:(lod_object.c:1152:lod_obj_stripe_attr_set_cb()) [0x100030000:0x1049:0x0]: set layout version: 630, comp_idx: 11
00000040:00100000:40.0:1506717693.938311:0:9452:0:(llog.c:208:llog_cancel_rec()) Canceling 6230 in log [0xd4:0x1:0x0]
00000004:80000000:9.0:1506717693.938312:0:11105:0:(osp_sync.c:425:osp_sync_add_rec()) [0xd5:0x1:0x0]: set layout version to: 630, rc = 0
..
00000004:80000000:9.0:1506717693.938390:0:11105:0:(lod_object.c:1152:lod_obj_stripe_attr_set_cb()) [0x100000000:0x104f:0x0]: set layout version: 630, comp_idx: 11
00000004:80000000:9.0:1506717693.938391:0:11105:0:(osp_sync.c:425:osp_sync_add_rec()) [0xd3:0x1:0x0]: set layout version to: 630, rc = 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thread 11105 wrote 4 records intended to set layout version on 4 OSTs. Writes were successful.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00100000:33.0:1506717694.004531:0:9454:0:(osp_sync.c:772:osp_sync_new_setattr_job()) @@@ lustre-OST0002-osc-MDT0000: [0x1045:0x0:0x0]: set layout version: 630
  req@ffff880797eec800 x1579668955411600/t0(0) o2-&amp;gt;lustre-OST0002-osc-MDT0000@10.8.1.68@tcp:28/4 lens 560/432 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
00000004:00100000:69.0:1506717694.004532:0:9721:0:(osp_sync.c:772:osp_sync_new_setattr_job()) @@@ lustre-OST0003-osc-MDT0000: [0x1049:0x0:0x0]: set layout version: 630
  req@ffff880fdb70a400 x1579668955411616/t0(0) o2-&amp;gt;lustre-OST0003-osc-MDT0000@10.8.1.68@tcp:28/4 lens 560/432 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
00000004:00100000:40.0:1506717694.004534:0:9452:0:(osp_sync.c:772:osp_sync_new_setattr_job()) @@@ lustre-OST0001-osc-MDT0000: [0x1048:0x0:0x0]: set layout version: 630
  req@ffff880fbd6ef200 x1579668955411632/t0(0) o2-&amp;gt;lustre-OST0001-osc-MDT0000@10.8.1.68@tcp:28/4 lens 560/432 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Later, those log were picked by the corresponding sync thread: 9454, 9721, and 9452. I have verified the OSTs had received the setattr RPC.&lt;/p&gt;

&lt;p&gt;The problem is the last one that is supposed to send to OST0000 and that was never happened. That request should be picked by thread 9450:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[jinxiong@wolf-68 ~]$ ps ax | grep osp-syn
  9450 ?        S      0:53 [osp-syn-0-0]
  9452 ?        S      0:53 [osp-syn-1-0]
  9454 ?        S      0:54 [osp-syn-2-0]
  9721 ?        S      0:54 [osp-syn-3-0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the thread stack trace of 9450:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[562512.630879] osp-syn-0-0     S ffff8808594dc2d8     0  9450      2 0x00000080
[562512.639451]  ffff88084461f9b8 0000000000000046 ffff88082279de20 ffff88084461ffd8
[562512.648442]  ffff88084461ffd8 ffff88084461ffd8 ffff88082279de20 0000000000002000
[562512.657438]  ffff8808594dc000 ffff8808594dc2e8 ffff881051a8c058 ffff8808594dc2d8
[562512.666447] Call Trace:                                                     
[562512.669845]  [&amp;lt;ffffffff8168c969&amp;gt;] schedule+0x29/0x70                        
[562512.676115]  [&amp;lt;ffffffffa13edd11&amp;gt;] osp_sync_process_queues+0x1641/0x20e0 [osp]
[562512.684729]  [&amp;lt;ffffffff810c54e0&amp;gt;] ? wake_up_state+0x20/0x20                 
[562512.691574]  [&amp;lt;ffffffffa0a03565&amp;gt;] llog_process_thread+0x5a5/0x1180 [obdclass]
[562512.700289]  [&amp;lt;ffffffffa13ec6d0&amp;gt;] ? osp_sync_thread+0x9d0/0x9d0 [osp]       
[562512.708235]  [&amp;lt;ffffffffa0a041fc&amp;gt;] llog_process_or_fork+0xbc/0x450 [obdclass]
[562512.716770]  [&amp;lt;ffffffffa0a0972a&amp;gt;] llog_cat_process_cb+0x20a/0x220 [obdclass]
[562512.725343]  [&amp;lt;ffffffffa0a03565&amp;gt;] llog_process_thread+0x5a5/0x1180 [obdclass]
[562512.734025]  [&amp;lt;ffffffff810cabe4&amp;gt;] ? select_task_rq_fair+0x584/0x720         
[562512.741645]  [&amp;lt;ffffffffa0a09520&amp;gt;] ? llog_cat_process_common+0x440/0x440 [obdclass]
[562512.750778]  [&amp;lt;ffffffffa0a041fc&amp;gt;] llog_process_or_fork+0xbc/0x450 [obdclass]
[562512.759354]  [&amp;lt;ffffffffa0a09520&amp;gt;] ? llog_cat_process_common+0x440/0x440 [obdclass]
[562512.768492]  [&amp;lt;ffffffffa0a086a9&amp;gt;] llog_cat_process_or_fork+0x199/0x2a0 [obdclass]
[562512.777520]  [&amp;lt;ffffffff810c54f2&amp;gt;] ? default_wake_function+0x12/0x20         
[562512.785286]  [&amp;lt;ffffffff810ba628&amp;gt;] ? __wake_up_common+0x58/0x90              
[562512.792479]  [&amp;lt;ffffffffa13ec6d0&amp;gt;] ? osp_sync_thread+0x9d0/0x9d0 [osp]       
[562512.800382]  [&amp;lt;ffffffffa0a087de&amp;gt;] llog_cat_process+0x2e/0x30 [obdclass]     
[562512.808440]  [&amp;lt;ffffffffa13ebefa&amp;gt;] osp_sync_thread+0x1fa/0x9d0 [osp]         
[562512.816175]  [&amp;lt;ffffffff81029569&amp;gt;] ? __switch_to+0xd9/0x4c0                  
[562512.823041]  [&amp;lt;ffffffffa13ebd00&amp;gt;] ? osp_sync_process_committed+0x6c0/0x6c0 [osp]
[562512.832031]  [&amp;lt;ffffffff810b0a4f&amp;gt;] kthread+0xcf/0xe0 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;(gdb) l *(osp_sync_process_queues+0x1641)
0x17d41 is in osp_sync_process_queues (/home/jinxiong/work/flr/lustre/osp/osp_sync.c:1151).
1146			}
1147	
1148			&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (d-&amp;gt;opd_sync_last_processed_id == d-&amp;gt;opd_sync_last_used_id)
1149				osp_sync_remove_from_tracker(d);
1150	
1151			l_wait_event(d-&amp;gt;opd_sync_waitq,
1152				     !osp_sync_running(d) ||
1153				     osp_sync_can_process_new(d, rec) ||
1154				     !list_empty(&amp;amp;d-&amp;gt;opd_sync_committed_there),
1155				     &amp;amp;lwi);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It was sitting there ignoring the new request.&lt;/p&gt;

&lt;p&gt;I suspect one of the condition was false that stopped that thread from working:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; inline &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; osp_sync_can_process_new(struct osp_device *d,                
                                           struct llog_rec_hdr *rec)            
{                                                                               
        LASSERT(d);                                                             
                                                                                
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unlikely(atomic_read(&amp;amp;d-&amp;gt;opd_sync_barrier) &amp;gt; 0))                    
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unlikely(osp_sync_in_flight_conflict(d, rec)))                      
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!osp_sync_rpcs_in_progress_low(d))                                  
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!osp_sync_rpcs_in_flight_low(d))                                    
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!d-&amp;gt;opd_imp_connected)                                              
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (d-&amp;gt;opd_sync_prev_done == 0)                                         
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 1;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (atomic_read(&amp;amp;d-&amp;gt;opd_sync_changes) == 0)                             
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rec == NULL ||                                                      
            osp_sync_correct_id(d, rec) &amp;lt;= d-&amp;gt;opd_sync_last_committed_id)       
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 1;                                                       
        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                                               
}  
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This problem is pretty consistent. I can see it after running sanity-pfl:200() for 2 days on average. Please help.&lt;/p&gt;</description>
                <environment></environment>
        <key id="48575">LU-10066</key>
            <summary>A potential bug on OSP setattr handling</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="jay">Jinshan Xiong</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Tue, 3 Oct 2017 19:23:12 +0000</created>
                <updated>Mon, 25 Jul 2022 14:49:50 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="210236" author="jay" created="Tue, 3 Oct 2017 19:32:47 +0000"  >&lt;p&gt;I won&apos;t touch the code that can reproduce this problem. Please let me know if you want to look at it.&lt;/p&gt;</comment>
                            <comment id="210742" author="gerrit" created="Tue, 10 Oct 2017 19:40:12 +0000"  >&lt;p&gt;Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/29550&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/29550&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10066&quot; title=&quot;A potential bug on OSP setattr handling&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10066&quot;&gt;LU-10066&lt;/a&gt; osp: error overflow handling for llog id&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b2d621a48fd70abd4d119091a7d2bbfeb4dad865&lt;/p&gt;</comment>
                            <comment id="341450" author="JIRAUSER17312" created="Mon, 25 Jul 2022 14:49:50 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=bzzz&quot; class=&quot;user-hover&quot; rel=&quot;bzzz&quot;&gt;bzzz&lt;/a&gt;&#160;&lt;/p&gt;

&lt;p&gt;Can you take a look? Thank you!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="32474">LU-7251</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="49027">LU-10170</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="28400" name="log.gz" size="10091857" author="jay" created="Tue, 3 Oct 2017 19:32:18 +0000"/>
                            <attachment id="28399" name="p" size="1582" author="jay" created="Tue, 3 Oct 2017 19:32:00 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10092" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>EX-4394</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzl73:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>