<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:08:16 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-563] Client evictions seen when running mdtest and repeated failover</title>
                <link>https://jira.whamcloud.com/browse/LU-563</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have seen client evictions while running mdtest on 55 nodes, with 8 takes&lt;br/&gt;
per node and continuous OSS failover taking place.&lt;/p&gt;

&lt;p&gt;The command being run is the following:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;srun -v -O -l -D /p/lcrater2/surya1/tmp/ -N 55 -n $((55*8))  -- \
        mdtest /p/lcrater2/surya1/tmp -I 10000 &amp;gt;MDTEST.stdout 2&amp;gt;MDTEST.stderr
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Out of the 3 times this has been run with OSS failover/failback happening in&lt;br/&gt;
the background, none have completed successfully. Two of the runs suffered from&lt;br/&gt;
client evictions, and the third suffered from a node failure (it is unclear&lt;br/&gt;
whether this was due to Lustre or not).&lt;/p&gt;

&lt;p&gt;Also, for what its worth, these issues have not yet appeared when failover is&lt;br/&gt;
not running in the background. I have run the above command a couple times&lt;br/&gt;
without simultaneously failing the OSS nodes, and those runs were able to&lt;br/&gt;
complete successfully.&lt;/p&gt;</description>
                <environment>CHAOS 5, RHEL 6, Lustre 2.0.66-2chaos</environment>
        <key id="11439">LU-563</key>
            <summary>Client evictions seen when running mdtest and repeated failover</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="prakash">Prakash Surya</reporter>
                        <labels>
                    </labels>
                <created>Tue, 2 Aug 2011 17:37:49 +0000</created>
                <updated>Tue, 13 Dec 2011 11:47:20 +0000</updated>
                            <resolved>Tue, 13 Dec 2011 11:47:20 +0000</resolved>
                                    <version>Lustre 2.1.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="18660" author="prakash" created="Tue, 2 Aug 2011 17:42:04 +0000"  >&lt;p&gt;This archive contains some of the results obtained by running the above&lt;br/&gt;
mentioned tests. The results for each test are contained in their own separate&lt;br/&gt;
directory. Thus, the results for the first test run is contained in directory&lt;br/&gt;
run1, the second test in run2, etc.&lt;/p&gt;

&lt;p&gt;Each directory should contain 4 files:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;MDTEST.cli          - contains the command line which was executed to kick&lt;br/&gt;
                         off the test.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;MDTEST.stdout       - The stdout output of the command executed.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;MDTEST.stderr       - The stderr output of the command executed.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;lustre.log-$DATE.gz - A collection of the Lustre console log messages for&lt;br/&gt;
                         the day the test was run.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Also, the root directory should contain the fail.sh script. This script was&lt;br/&gt;
simultaneously executed while the aforementioned tests were running to stress&lt;br/&gt;
the failover/failback capabilities of the filesystem.&lt;/p&gt;</comment>
                            <comment id="18661" author="morrone" created="Tue, 2 Aug 2011 20:11:21 +0000"  >&lt;p&gt;It would be helpful to enable the rpctrace debug level and enable /proc/sys/lustre/dump_on_eviction.&lt;/p&gt;

&lt;p&gt;Then reproduce the test and collect and examine the lustre logs after the eviction occurs.  With a little luck (and perhaps by enlarging the log buffer), we&apos;ll be able to see when the mds sends the lock callback and trace what happens to it.&lt;/p&gt;

&lt;p&gt;Does the rpc make it to the client?  Does the client try to send the reply?&lt;/p&gt;</comment>
                            <comment id="18664" author="pjones" created="Tue, 2 Aug 2011 23:45:53 +0000"  >&lt;p&gt;Oleg will look into this one&lt;/p&gt;</comment>
                            <comment id="18671" author="prakash" created="Wed, 3 Aug 2011 10:36:49 +0000"  >&lt;p&gt;I&apos;ve run the test again twice yesterday and didn&apos;t see the eviction issues. The command was able to complete successfully.&lt;/p&gt;

&lt;p&gt;Also, I enabled the rpctrace and dlmtrace debug flags on the clients and am running it again. Hopefully the issue arises with this extra debug info enabled. Will the output be directed to the console? or will I need to run lctl dk to get at it?&lt;/p&gt;</comment>
                            <comment id="18750" author="prakash" created="Fri, 5 Aug 2011 12:24:22 +0000"  >&lt;p&gt;Results from another run of this test. The same MDTEST.* and lustre.log-$DATE.gz files from this run are included. Additionally, a log dump from one of the evicted clients (zwicky136) with rpctrace and dlmtrace debug flags enabled is included.&lt;/p&gt;</comment>
                            <comment id="19033" author="bobijam" created="Wed, 10 Aug 2011 09:45:51 +0000"  >&lt;p&gt;Prakash,&lt;/p&gt;

&lt;p&gt;from MDTEST.stderr, i got &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;srun: Node zwicky115, 8 tasks started
srun: Node zwicky120, 8 tasks started
099: srun: mvapich: 2011-08-03T21:58:37: ABORT from MPI rank 99 [on zwicky114]
[99] [MPI Abort by user] Aborting Program!
099: [99:zwicky114] Abort: MPI_Abort() code: 1, rank 99, MPI Abort by user Aborting program ! at line 99 in file mpid_init.c
097: [97] [MPI Abort by user] Aborting Program!
097: [97:zwicky114] Abort: MPI_Abort() code: 1, rank 97, MPI Abort by user Aborting program ! at line 99 in file mpid_init.c
100: [100] [MPI Abort by user] Aborting Program!
100: [100:zwicky114] Abort: MPI_Abort() code: 1, rank 100, MPI Abort by user Aborting program ! at line 99 in file mpid_init.c
048: slurmd[zwicky108]: *** STEP 1000293.0 KILLED AT 2011-08-03T21:58:37 WITH SIGNAL 9 ***
416: slurmd[zwicky160]: *** STEP 1000293.0 KILLED AT 2011-08-03T21:58:37 WITH SIGNAL 9 ***
...
240: slurmd[zwicky136]: *** STEP 1000293.0 KILLED AT 2011-08-03T21:58:37 WITH SIGNAL 9 ***

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;what made zwicky114 abort the test (it said &quot;MPI Abort by user&quot;)? And what makes following client be killed by SIGNAL 9?&lt;/p&gt;</comment>
                            <comment id="19042" author="prakash" created="Wed, 10 Aug 2011 13:55:15 +0000"  >&lt;p&gt;I&apos;m not entirely certain what caused the abortion, but I imagine it is related to the eviction. My guess would be that slurm killed the job. I know slurm does that at times, although I don&apos;t know how it decides when to kill it. Is Chris watching this issue? He might be able to give a better answer. My slurm and mdtest experience is still still pretty limited..&lt;/p&gt;</comment>
                            <comment id="19048" author="pjones" created="Wed, 10 Aug 2011 14:15:10 +0000"  >&lt;p&gt;Chris was not watching but I just added him so hopefully he will now chip in and comment&lt;/p&gt;</comment>
                            <comment id="19051" author="pjones" created="Wed, 10 Aug 2011 14:33:56 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please try and reproduce this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="19061" author="bobijam" created="Thu, 11 Aug 2011 03:40:25 +0000"  >&lt;p&gt;would you mind also uploading zwicky114&apos;s debug log?&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;zwicky136&apos;s client debug log&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;00000100:00080000:1.0:1312433918.391712:0:3861:0:(import.c:929:ptlrpc_connect_interpret()) @@@ lc2-MDT0000-mdc-ffff880213030400: evicting (reconnect/recover flags not set: 4)  req@ffff8801e6267c00 x1375691751479664/t0(0) o-1-&amp;gt;lc2-MDT0000_UUID@10.1.1.212@o2ib9:12/10 lens 368/264 e 0 to 0 dl 1312433938 ref 1 fl Interpret:RN/ffffffff/ffffffff rc 0/-1&lt;br/&gt;
00000100:00100000:0.0:1312433918.391715:0:3860:0:(client.c:1392:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:496561de-a766-0837-467a-298d86c35661:3860:1375691751479700:10.1.1.50@o2ib9:400&lt;br/&gt;
00000100:00080000:1.0:1312433918.391717:0:3861:0:(import.c:932:ptlrpc_connect_interpret()) ffff8801fae1a000 lc2-MDT0000_UUID: changing import state from CONNECTING to EVICTED&lt;br/&gt;
00000100:02020000:1.0:1312433918.391720:0:3861:0:(import.c:1312:ptlrpc_import_recovery_state_machine()) 167-0: This client was evicted by lc2-MDT0000; in progress operations using this service will fail.&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;zwicky136 was evicted by MDS, your test does not reboot MDS, is it?&lt;/p&gt;</comment>
                            <comment id="19063" author="bobijam" created="Thu, 11 Aug 2011 05:31:53 +0000"  >&lt;p&gt;Maybe we also need MDS and/or OSS debug log (depends on which server evict clients) to see what happens to make server evict clients.&lt;/p&gt;</comment>
                            <comment id="19064" author="bobijam" created="Thu, 11 Aug 2011 06:34:47 +0000"  >&lt;p&gt;from lustre.log-20110804&lt;/p&gt;

&lt;div class=&quot;panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;$ grep &quot;evicting&quot; lustre.log-20110804&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;panelContent&quot;&gt;
&lt;p&gt;13912:Aug  3 07:18:10 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 71505a8c-ba89-4584-7b3d-bc9313e6a77e (at 10.1.1.49@o2ib9) in 228 seconds. I think it&apos;s dead, and I am evicting it. exp ffff8807ed5bb400, cur 1312381090 expire 1312380940 last 1312380862&lt;br/&gt;
15337:Aug  3 07:40:11 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 5d784239-e35d-b7d2-39aa-cd2ac325d166 (at 10.1.1.50@o2ib9) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88082f24a000, cur 1312382411 expire 1312382261 last 1312382184&lt;br/&gt;
16868:Aug  3 08:02:09 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client bef2aaa1-8742-9da3-c1bb-ca4469b02bfa (at 10.1.1.47@o2ib9) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88082f27b800, cur 1312383729 expire 1312383579 last 1312383502&lt;br/&gt;
18391:Aug  3 08:24:08 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client b99c22a5-3135-69ed-4913-98d6ef6890a8 (at 10.1.1.48@o2ib9) in 228 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88082f244400, cur 1312385048 expire 1312384898 last 1312384820&lt;br/&gt;
18392:Aug  3 08:24:11 zwicky-mds2 kernel: LustreError: 0:0:(ldlm_lockd.c:353:waiting_locks_callback()) ### lock callback timer expired after 121s: evicting client at 192.168.65.159@o2ib6  ns: mdt-ffff88062e597800 lock: ffff880215a09900/0x7fc037415eb6f2fa lrc: 3/0,0 mode: PR/PR res: 8589955869/128531 bits 0x3 rrc: 2 type: IBT flags: 0x4000020 remote: 0x63f5e2e4272fe344 expref: 6 pid: 11575 timeout: 4720731264&lt;br/&gt;
18406:Aug  3 08:24:14 zwicky-mds2 kernel: LustreError: 0:0:(ldlm_lockd.c:353:waiting_locks_callback()) ### lock callback timer expired after 124s: evicting client at 192.168.65.113@o2ib6  ns: mdt-ffff88062e597800 lock: ffff8802233066c0/0x7fc037415eb6f1fe lrc: 3/0,0 mode: PR/PR res: 8589955877/128725 bits 0x3 rrc: 2 type: IBT flags: 0x4000020 remote: 0x1f6ab0dd71fd7054 expref: 6 pid: 11588 timeout: 4720734633&lt;br/&gt;
18410:Aug  3 08:24:15 zwicky-mds2 kernel: LustreError: 0:0:(ldlm_lockd.c:353:waiting_locks_callback()) ### lock callback timer expired after 125s: evicting client at 192.168.65.129@o2ib6  ns: mdt-ffff88062e597800 lock: ffff8807e1e5b240/0x7fc037415eb6f23d lrc: 3/0,0 mode: PR/PR res: 8589955863/130615 bits 0x3 rrc: 2 type: IBT flags: 0x4000020 remote: 0x8b1f51c6458607ef expref: 6 pid: 11614 timeout: 4720735521&lt;br/&gt;
19895:Aug  3 08:46:08 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 1899ad40-4369-3323-c979-64ddae8268dd (at 10.1.1.49@o2ib9) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff880623bae000, cur 1312386368 expire 1312386218 last 1312386141&lt;br/&gt;
....&lt;br/&gt;
72070:Aug  3 21:36:40 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 5069322b-5692-cb6b-00fe-d476448b97d0 (at 10.1.1.48@o2ib9) in 228 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88022e7fc000, cur 1312432600 expire 1312432450 last 1312432372&lt;br/&gt;
73623:Aug  3 21:58:16 zwicky-mds2 kernel: LustreError: 0:0:(ldlm_lockd.c:353:waiting_locks_callback()) ### lock callback timer expired after 177s: evicting client at 192.168.65.136@o2ib6  ns: mdt-ffff88062e597800 lock: ffff8804f8c9db40/0x7fc0374182497777 lrc: 3/0,0 mode: PR/PR res: 8589955938/117939 bits 0x3 rrc: 2 type: IBT flags: 0x4000020 remote: 0x2a0a36f5907d39a9 expref: 6 pid: 11581 timeout: 4769576674&lt;br/&gt;
73624:Aug  3 21:58:25 zwicky-mds2 kernel: LustreError: 11581:0:(ldlm_lockd.c:353:waiting_locks_callback()) ### lock callback timer expired after 186s: evicting client at 192.168.65.130@o2ib6  ns: mdt-ffff88062e597800 lock: ffff880358a02b40/0x7fc0374182497bb4 lrc: 3/0,0 mode: PR/PR res: 8589955935/118438 bits 0x3 rrc: 2 type: IBT flags: 0x4000020 remote: 0x86f348938dde31b9 expref: 6 pid: 11591 timeout: 4769585675&lt;br/&gt;
73654:Aug  3 21:58:41 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 6d540fa0-c8e4-467f-8ef8-a356affe3ffb (at 10.1.1.49@o2ib9) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88061ccb0400, cur 1312433921 expire 1312433771 last 1312433694&lt;br/&gt;
75122:Aug  3 22:20:49 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 80f3e272-b241-545d-dd92-03f431d17b56 (at 10.1.1.50@o2ib9) in 232 seconds. I think it&apos;s dead, and I am evicting it. exp ffff8804e172b400, cur 1312435249 expire 1312435099 last 1312435017&lt;br/&gt;
76644:Aug  3 22:42:49 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 283fce0a-2dfd-7651-7efa-8642eb937fb0 (at 10.1.1.47@o2ib9) in 228 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88042df07000, cur 1312436569 expire 1312436419 last 1312436341&lt;br/&gt;
78079:Aug  3 23:04:46 zwicky-mds2 kernel: Lustre: MGS: haven&apos;t heard from client 2be415db-52d4-9d69-d61a-eef3f6ace520 (at 10.1.1.48@o2ib9) in 228 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88061cec0000, cur 1312437886 expire 1312437736 last 1312437658&lt;br/&gt;
...&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I guess zwicky47/48/49/50 are OSS nodes, aren&apos;t they, MDS evict OSS for they are failing over.&lt;/p&gt;

&lt;p&gt;While node with 192.168.65.159/192.168.65.113/192.168.65.129/192.168.65.136/192.168.65.130 (guessing they are zwicky159/113/129/136/130) cannot respond MDS&apos;s lock callback in time, hence got evicted from MDS.&lt;/p&gt;</comment>
                            <comment id="19072" author="prakash" created="Thu, 11 Aug 2011 11:28:35 +0000"  >&lt;p&gt;Yes zwicky&lt;span class=&quot;error&quot;&gt;&amp;#91;47-50&amp;#93;&lt;/span&gt; are the OSS nodes undergoing failover and failback, So the evictions dealing with those nodes I think are to be expected. The MDS on the other hand is not being rebooted. So the question is, why can&apos;t the clients respond back to the MDS in time?&lt;/p&gt;

&lt;p&gt;Perhaps it is an adaptive timeout issue, where the MDS is getting stuck for some reason and doesn&apos;t get back to the client in a resonable amount of time.&lt;/p&gt;

&lt;p&gt;I will work on reproducing this issue again and enable the dump_on_eviction flag on the MDS as well as the clients. Then we may be able to correlate the logs from the server and the client and get a better overall picture of what is happening.&lt;/p&gt;

&lt;p&gt;Besides &apos;ha&apos;, are there any other flags I should set? Currently I am just using the defaults &apos;ioctl neterror warning error emerg config console&apos; and adding &apos;+rpctrace&apos; and &apos;+dlmtrace&apos;.&lt;/p&gt;

&lt;p&gt;And would you still like me to dig up the console logs for zwicky114? If so, for what day?&lt;/p&gt;</comment>
                            <comment id="19089" author="bobijam" created="Thu, 11 Aug 2011 12:12:54 +0000"  >&lt;p&gt;I think &apos;+ha&apos; &apos;+rpctrace&apos; &apos;+dlmtrace&apos; is good enough. Yes, I just don&apos;t get what really happens to make mdtest.stderr states that zwicky114 was aborted by user, wondering whether it relates to the eviction issue we met here. &lt;/p&gt;</comment>
                            <comment id="19165" author="pjones" created="Fri, 12 Aug 2011 10:17:48 +0000"  >&lt;p&gt;Prakash&lt;/p&gt;

&lt;p&gt;When do you expect to be able to gather this information?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="19168" author="prakash" created="Fri, 12 Aug 2011 11:47:40 +0000"  >&lt;p&gt;Here&apos;s the results from another run of &apos;mdtest&apos;. The directory structure is a little different than before, but I imagine it shouldn&apos;t be too hard to grok. Ping me if you have any questions. I had to remove some log files to get the archive under the 10MB limit. &lt;/p&gt;

&lt;p&gt;Also, this run of mdtest was actually able to successfully complete. Although it looks like some the clients still saw some evictions.&lt;/p&gt;

&lt;p&gt;The MDS didn&apos;t output any dump log files even with &apos;dump_on_eviction&apos; enabled. Is this expected?&lt;/p&gt;

&lt;p&gt;Let me know if you need me to upload any the console logs as these are not in the archive.&lt;/p&gt;</comment>
                            <comment id="19169" author="prakash" created="Fri, 12 Aug 2011 11:49:10 +0000"  >&lt;p&gt;Sorry for the delay Peter.&lt;/p&gt;</comment>
                            <comment id="19222" author="bobijam" created="Sun, 14 Aug 2011 23:05:45 +0000"  >&lt;p&gt;Prakash Surya,&lt;/p&gt;

&lt;p&gt;Your latest logs show that clients reconnection to MGS timeout and got evicted from MGS and later recovered. So your original test failure I doubt blames to eviction.&lt;/p&gt;

&lt;p&gt;Any only clients got eviction happens, so it is client not MDS dump logs when client got evicted.&lt;/p&gt;</comment>
                            <comment id="19228" author="prakash" created="Mon, 15 Aug 2011 12:45:12 +0000"  >&lt;p&gt;This run is from a failed run of mdtest. It contains a dump from the MDS along with a few clients.&lt;/p&gt;</comment>
                            <comment id="19229" author="prakash" created="Mon, 15 Aug 2011 12:53:06 +0000"  >&lt;p&gt;Hi bobijam, I just uploaded the results from another failed run which contains both client and MDS dump files. I haven&apos;t been able to look too closely at the dumps yet, but hopefully we can piece together a better picture of the problem with the information from the MDS.&lt;/p&gt;</comment>
                            <comment id="19232" author="green" created="Mon, 15 Aug 2011 13:54:10 +0000"  >&lt;p&gt;I looked through the logs, in particular this bit from client 151:&lt;/p&gt;

&lt;p&gt;00010000:00010000:0.0:1313196271.425540:0:3932:0:(ldlm_request.c:1085:ldlm_cancel_pack()) ### packing ns: lc2-MDT0000-mdc-ffff88021302d000 lock: ffff8801eab96d80/0x3fe5176e75f9e59f lrc: 2/0,0 mode: --/PR res: 8589956318/75033 bits 0x3 rrc: 1 type: IBT flags: 0x30002090 remote: 0x7fc03742338bfc8b expref: -99 pid: 15372 timeout: 0&lt;br/&gt;
00010000:00010000:0.0:1313196271.425545:0:3932:0:(ldlm_request.c:1089:ldlm_cancel_pack()) 1 locks packed&lt;br/&gt;
00000100:00100000:0.0:1313196271.425548:0:3932:0:(client.c:1990:ptlrpc_set_wait()) set ffff8801fd74ccc0 going to sleep for 0 seconds&lt;br/&gt;
00000100:00100000:0.0:1313196271.425550:0:3932:0:(client.c:1726:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ldlm_bl_01:a4a6aee8-112a-dc35-e195-4b88f07a26e6:3932:1376864457802283:10.1.1.212@o2ib9:103&lt;br/&gt;
00010000:00020000:0.0:1313196271.425552:0:3932:0:(ldlm_request.c:1169:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway&lt;/p&gt;

&lt;p&gt;So we did EINTR right away without even attempting to send anything.&lt;br/&gt;
I don&apos;t see the interrupted set CERROR message that&apos;s right at the beginning of it, though.&lt;/p&gt;

&lt;p&gt;00010000:00020000:0.0:1313196271.445106:0:3932:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11&lt;br/&gt;
00010000:00010000:0.0:1313196271.463005:0:3932:0:(ldlm_lockd.c:1506:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: lc2-MDT0000-mdc-ffff88021302d000 lock: ffff8801eab96d80/0x3fe5176e75f9e59f lrc: 1/0,0 mode: --/PR res: 8589956318/75033 bits 0x3 rrc: 1 type: IBT flags: 0x30002090 remote: 0x7fc03742338bfc8b expref: -99 pid: 15372 timeout: 0&lt;/p&gt;

&lt;p&gt;Naturally since we did not send anything, the server later evicted this client.&lt;/p&gt;

&lt;p&gt;It seems that the &quot;going to sleep for 0 seconds&quot; is unusual, all other instances of blocking callback were going to sleep for 23 seconds and succeeded.&lt;br/&gt;
So this is likely the error that needs to be found.&lt;/p&gt;</comment>
                            <comment id="19234" author="green" created="Mon, 15 Aug 2011 14:39:18 +0000"  >&lt;p&gt;Ok, further looking into the log revealed this:&lt;br/&gt;
00000100:02020000:1.0:1313196264.669907:0:3915:0:(client.c:1122:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.1.1.212@o2ib9. The mds_connect operation failed with -16&lt;br/&gt;
00000100:00080000:1.0:1313196264.669909:0:3915:0:(import.c:1125:ptlrpc_connect_interpret()) ffff8801fa85f800 lc2-MDT0000_UUID: changing import state from CONNECTING to DISCONN&lt;/p&gt;

&lt;p&gt;So the import is not in the full state which prevents us from sending the cancel out.&lt;/p&gt;

&lt;p&gt;The no_delay bit of logic was added by a patch from bug 11974. In particular comment #34 from Andreas asks exactly about this scenario. Supposedly this could not happen and I explain it in comment #35, but of course the reality is slightly different from theory &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Basically what happened with at least some of the clients is they sent some requests to MDS and those requests got stuck waiting on a lock that was not released. Then clients timed out (deadline set at 73 seconds apparently) and are trying to reconnect.&lt;br/&gt;
At the same time server still maintains the connection and is able to send things like ASTs back to the client. Of course this would not help the client at all since the client cannot send anything back even if it wanted to until it reconnects, but the server is not letting the client to reconnect because it is already processing a request from this client.&lt;/p&gt;

&lt;p&gt;Now the original reason for why some of the requests got stuck in MDS is still unclear at this point, but I only reviewed two client logs. Also MDS log does not go into the past far enough to see it on that end (possibly MDS getting stuck waiting for objects from osc with locks held?).&lt;/p&gt;

&lt;p&gt;I guess &quot;mds waiting for objects with locks held&quot; scenario needs to be investigated first and if true, needs to be relieved as that was a massive source of failures in the past too.&lt;/p&gt;</comment>
                            <comment id="19235" author="prakash" created="Mon, 15 Aug 2011 16:20:26 +0000"  >&lt;p&gt;Thanks for the analysis Oleg. Your description definitely helps my understanding of the problem.&lt;/p&gt;

&lt;p&gt;As far as the MDS log, would it help to increase the buffer size on the MDS and reproduce the problem? Currently the size of the buffer on the MDS is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# zwicky-mds2 /root &amp;gt; cat /proc/sys/lnet/debug_mb
81
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="19258" author="green" created="Mon, 15 Aug 2011 22:59:08 +0000"  >&lt;p&gt;Well, 82M is definitely not enough as we can see.If you can spare the memory, the maximum amount for the log is 2G &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="19284" author="prakash" created="Tue, 16 Aug 2011 12:00:08 +0000"  >&lt;p&gt;Here&apos;s the memory info for the MDS while it is essentially idle:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# zwicky-mds2 /root &amp;gt; free
             total       used       free     shared    buffers     cached
Mem:      32881440   26163084    6718356          0   23873776     215796
-/+ buffers/cache:    2073512   30807928
Swap:            0          0          0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There&apos;s about 6.5G available, but I&apos;m not familiar enough with the memory usage of the MDS to really know if it could spare the 2G for the log when it is under load. I will go ahead and try it and see what happens. Let me know if you think another size would be better.&lt;/p&gt;</comment>
                            <comment id="19300" author="morrone" created="Tue, 16 Aug 2011 19:03:51 +0000"  >&lt;p&gt;Prakash, the 6.5GB number by itself is not particularly useful.  With Linux, the &quot;free&quot; value always tends to be fairly low because the kernel will always employ much of the free memory for buffers and cache.  The kernel there is using almost 24GB for buffers and 215MB for cache.  I don&apos;t think that 2GB for the log will be a significant problem.&lt;/p&gt;</comment>
                            <comment id="24414" author="pjones" created="Mon, 12 Dec 2011 17:42:49 +0000"  >&lt;p&gt;Is this still a live issue or has it become obsolete as testing has moved forward onto more current code?&lt;/p&gt;</comment>
                            <comment id="24613" author="prakash" created="Tue, 13 Dec 2011 11:42:12 +0000"  >&lt;p&gt;On our end, we&apos;ve moved on to testing more current 2.1 code. Unless Chris chimes in and says otherwise, I think its safe to obsolete this issue.&lt;/p&gt;</comment>
                            <comment id="24614" author="pjones" created="Tue, 13 Dec 2011 11:47:20 +0000"  >&lt;p&gt;Thanks Prakash. I will close out the ticket for now and you can always reopen it if Chris feels that it would be worth pursuing this further under this ticket (rather than opening a new one)&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="10342" name="MDTEST7.tar.gz" size="9789440" author="prakash" created="Fri, 5 Aug 2011 12:24:22 +0000"/>
                            <attachment id="10338" name="mdtest-results.tar.gz" size="1048604" author="prakash" created="Tue, 2 Aug 2011 17:42:04 +0000"/>
                            <attachment id="10354" name="results.tar.gz" size="8367591" author="prakash" created="Mon, 15 Aug 2011 12:45:12 +0000"/>
                            <attachment id="10350" name="results.tar.gz" size="6789595" author="prakash" created="Fri, 12 Aug 2011 11:47:40 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvi1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6571</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>