<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4292] Lustre data loss when client hang up</title>
                <link>https://jira.whamcloud.com/browse/LU-4292</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Error watched on Lustre 2.1.6 and later. Rhel 6.x data loss. If the file was open or modified approximately in 30 seconds before the server failed, all data in this file will be erased. Somebody watched a similar error?&lt;/p&gt;</description>
                <environment>OS RHEL 6.4, 6.3, Lustre versions lustre 2.1.6, 2.3, 2.4.1, 2.5. The error was watched on the different hardwares.</environment>
        <key id="22206">LU-4292</key>
            <summary>Lustre data loss when client hang up</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="jamesanunez">James Nunez</assignee>
                                    <reporter username="alexander">Alexander</reporter>
                        <labels>
                            <label>server</label>
                    </labels>
                <created>Fri, 22 Nov 2013 06:55:06 +0000</created>
                <updated>Wed, 26 Apr 2017 20:02:08 +0000</updated>
                            <resolved>Wed, 26 Apr 2017 20:02:08 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="72190" author="adilger" created="Sat, 23 Nov 2013 16:52:32 +0000"  >&lt;p&gt;Could you please explain your testing process further?  Does this affect existing files that are opened, or only newly created files?  If the server fails, but the client is still running, and then the server is restarted, is there data loss?&lt;/p&gt;</comment>
                            <comment id="72207" author="alexander" created="Mon, 25 Nov 2013 05:05:10 +0000"  >&lt;p&gt;There is a prototype consisting of 2 servers.  One server of meta data and object data.  The second server is used as the client.  Let&apos;s allow in file system already there was a certain file_1.txt file in which &quot;a, b, c&quot; was written.  If from the client to execute record in this file &quot;echo d &amp;gt;&amp;gt; /mnt/file_1.txt&quot; that in file /proc/fs/osc/lustre-OST-*/cur_dirty_bytes will appear instead of 0 4096 (byte).  Data are stored in &quot;dirty_data&quot; about 30 seconds then probably it is carried out from disk writing.  And so, if the client machine fails before these will expire 30 seconds that in the /mnt/file_1.txt file will be gone not only those data which were just written (by means of the &quot;echo d &amp;gt;&amp;gt; /mnt/file_1.txt&quot; command), but also those that were written to it.  I.e. this file will be empty. &lt;br/&gt;
In case of study file Lustre-x.x/lustre/ChangeLog found a similar error which can be found bugzilla 12181, 12203, but it is dated 2007-04-01.&lt;/p&gt;</comment>
                            <comment id="72429" author="pjones" created="Wed, 27 Nov 2013 19:38:58 +0000"  >&lt;p&gt;James&lt;/p&gt;

&lt;p&gt;Could you please see if you can reproduce this reported behavior?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="74642" author="alexander" created="Thu, 9 Jan 2014 12:36:23 +0000"  >&lt;p&gt;It would be desirable to learn, whether it managed to you to repeat this situation?&lt;/p&gt;</comment>
                            <comment id="74663" author="jamesanunez" created="Thu, 9 Jan 2014 17:44:25 +0000"  >&lt;p&gt;Alexander, &lt;/p&gt;

&lt;p&gt;I&apos;m having trouble recreating this problem.  &lt;/p&gt;

&lt;p&gt;I have a single server with MDS/MGS and OSS and a single client. I write to a file using &quot;echo&quot; and I can see the cur_dirty_bytes stay at 4096 for a while and then change to 0, but I can&apos;t get any data loss or corruption. To fail the client I&apos;ve used reboot and shutdown immediately after the echo command was issued.  &lt;/p&gt;

&lt;p&gt;So, I may be doing things differently from you. How did you fail the client?&lt;/p&gt;</comment>
                            <comment id="74721" author="alexander" created="Fri, 10 Jan 2014 04:48:18 +0000"  >&lt;p&gt;James,&lt;/p&gt;

&lt;p&gt;In case of reboot or poweroff loss of data isn&apos;t watched. Loss of data is watched in case of hangup of an operating system or for example forced switch-off of power supply from the client. Respectively, if the client hangs up or will be compulsorily disconnected from power supply at that moment while in the cur_dirty_bytes file there will be a value 4096 that there will be a loss of data.&lt;/p&gt;</comment>
                            <comment id="74858" author="jamesanunez" created="Mon, 13 Jan 2014 21:38:54 +0000"  >&lt;p&gt;Alexander, &lt;/p&gt;

&lt;p&gt;I&apos;ve moved my testing to using VMs so that I can shut off machines to mimic a node going down. Here is what I can reproduce, using the b2_3 branch of Lustre:&lt;/p&gt;

&lt;p&gt;I have a single node MDS and OSS and two clients. One client writes data to a file called test_file and the write completes normally. That same client writes data to the existing file, I print cur_dirty_bytes which is 4096 and then I power off the VM, which does not flush the cache. I go to the second client and try and list the contents of the Lustre file system and it hangs. If I control C the listing, I get something like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client2 ~]# ls -l /lustre/scratch/
^Cls: cannot access /lustre/scratch/test_file: Interrupted system call
total 0
-????????? ? ? ? ?            ? test_file
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If I wait for a couple of minutes, the original file/data is restored, but the data in flight never makes it to the file:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client2 ~]# ls /lustre/scratch/
client_2_touch  test_file
[root@client2 ~]# more /lustre/scratch/test_file 
a, b, c
d, e, f
g, h, i
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Does this match your experience? If you wait a few minutes, does the file that your client was writing to, /mnt/file_1 in your example, come back and is the original data &quot;a, b, c&quot; there?&lt;/p&gt;</comment>
                            <comment id="74904" author="alexander" created="Tue, 14 Jan 2014 07:18:38 +0000"  >&lt;p&gt;James,&lt;/p&gt;

&lt;p&gt;     The experiment made by you matches that I saw off except for one point. I made experiment not on the VM and on real servers, respectively the cache on real servers was cleared (dropped).&lt;br/&gt;
     Probably on that the test_file file was recovered from a cache only after booted switched off by you VMs machine. Whether so it?&lt;br/&gt;
     Try to repeat your experiment, but so that in case of forced VM reset/switching off its cache would be dropped (is cleared).&lt;/p&gt;</comment>
                            <comment id="75039" author="jay" created="Wed, 15 Jan 2014 22:52:09 +0000"  >&lt;p&gt;Hi Alexander, when you saw the empty, do you notice what&apos;s the file size?&lt;/p&gt;

&lt;p&gt;You can try to write something into the file, and then make sure the content has been written to OST, and then reproduce this problem. For example:&lt;/p&gt;

&lt;p&gt;   echo &quot;hello world&quot; &amp;gt; test_file;&lt;br/&gt;
   sync; sync; sync&lt;br/&gt;
   run your test to append the file&lt;/p&gt;

&lt;p&gt;After you&apos;ve reproduced the problem, please check what the file size is.&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="75066" author="alexander" created="Thu, 16 Jan 2014 07:47:01 +0000"  >&lt;p&gt;Hi Jinshan,&lt;/p&gt;

&lt;p&gt;I wrote similar procedure:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;#!/bin/bash

path=/proc/fs/lustre/osc
ost_num=test-OST000
name=cur_dirty_bytes

&lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; [ 1 ];
&lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
  &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; j in 0 1;
    &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
      c=`cat $(path)/$(ost_num)$(j)*/$(name)`;
      &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; [ $(c) -ne 0 ]; then
        sync;
      fi
    done
  sleep 0.3;
done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This script forces to drop the memories this from a cache (if they there are) on a disk with an interval approximately to equal 1 second.&lt;/p&gt;

&lt;p&gt;But it only minimizes possibility of a similar situation. On default about 30 seconds are stored in file system of cur_dirty_bytes then they automatically register from a memory cache in a disk. Thus the probability of loss of data remains if the client hangs up in that timepoint when the memories this in a cache still will be and the sync command yet didn&apos;t pass.&lt;/p&gt;</comment>
                            <comment id="75095" author="jamesanunez" created="Thu, 16 Jan 2014 16:52:54 +0000"  >&lt;p&gt;Alexander, &lt;/p&gt;

&lt;p&gt;Can you reproduce the data loss on your cluster/nodes? In other words, is this a persistent problem, does it happen occasionally or did it stop happening?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="75163" author="alexander" created="Fri, 17 Jan 2014 08:35:33 +0000"  >&lt;p&gt;James,&lt;/p&gt;

&lt;p&gt;Yes at me this problem arises stablly, in case of described by me in the previous messages conditions. I can reproduce this error on the equipment also.&lt;/p&gt;</comment>
                            <comment id="78206" author="alexander" created="Mon, 3 Mar 2014 12:44:03 +0000"  >&lt;p&gt;James,&lt;/p&gt;

&lt;p&gt;Hello, whether there were at you any news on this subject?&lt;/p&gt;</comment>
                            <comment id="79190" author="jamesanunez" created="Wed, 12 Mar 2014 21:14:57 +0000"  >&lt;p&gt;Alexander, &lt;/p&gt;

&lt;p&gt;Nothing new to report; I am not able to reproduce this error. Would you please let me know what your mount options are when you experience the data loss? Also, do you have logs from these nodes when the data loss happens and would you make those available to me?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;</comment>
                            <comment id="81515" author="sasha111" created="Mon, 14 Apr 2014 07:46:53 +0000"  >&lt;p&gt;James,&lt;/p&gt;

&lt;p&gt;   I reproduced this situation on the following prototype:&lt;br/&gt;
Two virtual machines, the first (VM1) is used as MDS/OSS, the second (VM2) as the client. X86_64 RHEL 6.4 operating system, lustre 2.4.1.&lt;/p&gt;

&lt;p&gt;   The sequence of actions was following:&lt;/p&gt;

&lt;p&gt;   On VM2 write data to /mnt/file:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n2 ~&amp;#93;&lt;/span&gt;# date &amp;amp;&amp;amp; dmesg &amp;gt; /mnt/file &amp;amp;&amp;amp; ll /mnt/file &amp;amp;&amp;amp; date&lt;br/&gt;
Thu Apr 10 02:58:16 EDT 2014&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 79650 Apr 10 02:58 /mnt/file&lt;br/&gt;
Thu Apr 10 02:58:16 EDT 2014&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n2 ~&amp;#93;&lt;/span&gt;#&lt;/p&gt;

&lt;p&gt;   On VM1 do:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n1 common&amp;#93;&lt;/span&gt;# date &amp;amp;&amp;amp; ll /mnt/file &amp;amp;&amp;amp; date&lt;br/&gt;
Thu Apr 10 02:58:22 EDT 2014&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 79650 Apr 10 02:58 /mnt/file&lt;br/&gt;
Thu Apr 10 02:58:22 EDT 2014&lt;/p&gt;

&lt;p&gt;   The file at /mnt have size 79650.&lt;/p&gt;

&lt;p&gt;   Force poweroff VM2.&lt;/p&gt;

&lt;p&gt;   On VM1 do:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n1 common&amp;#93;&lt;/span&gt;# date &amp;amp;&amp;amp; ll /mnt/file &amp;amp;&amp;amp; date&lt;br/&gt;
Thu Apr 10 02:58:33 EDT 2014&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 79650 Apr 10 02:58 /mnt/file&lt;br/&gt;
Thu Apr 10 03:02:07 EDT 2014&lt;/p&gt;

&lt;p&gt;   After about four minutes of waiting the command was executed also the file it seems not empty, but in case of repeated execution of the same command we receive:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n1 common&amp;#93;&lt;/span&gt;# date &amp;amp;&amp;amp; ll /mnt/file &amp;amp;&amp;amp; date&lt;br/&gt;
Thu Apr 10 03:02:13 EDT 2014&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 0 Apr 10 02:58 /mnt/file&lt;br/&gt;
Thu Apr 10 03:02:13 EDT 2014&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n1 common&amp;#93;&lt;/span&gt;# &lt;/p&gt;

&lt;p&gt;   The file is empty. The VM2 server thus remained switched off.&lt;/p&gt;

&lt;p&gt;   The output of file /var/log/messages of the VM1 server from the moment of removal of a supply from VM2 is given below:&lt;/p&gt;

&lt;p&gt;Apr 10 02:58:40 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113113/real 1397113113&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113120 ref 1 fl Rpc:X/0/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:58:48 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113120/real 1397113120&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113127 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:58:55 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113127/real 1397113127&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113134 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:59:02 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113134/real 1397113134&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113141 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:59:09 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113141/real 1397113142&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113148 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:59:23 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113155/real 1397113157&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113162 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:59:23 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 1 previous similar message&lt;br/&gt;
Apr 10 02:59:38 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113178/real 1397113178&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113185 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 02:59:38 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 495793 previous similar messages&lt;br/&gt;
Apr 10 03:00:11 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113210/real 1397113210&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113217 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 03:00:11 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 4198137 previous similar messages&lt;br/&gt;
Apr 10 03:01:14 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: &lt;span class=&quot;error&quot;&gt;&amp;#91;sent 1397113274/real 1397113274&amp;#93;&lt;/span&gt;  req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113281 ref 1 fl Rpc:X/2/ffffffff rc 0/-1&lt;br/&gt;
Apr 10 03:01:14 n1 kernel: Lustre: 3118:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 9459123 previous similar messages&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: LNet: Service thread pid 3118 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: Pid: 3118, comm: ll_ost00_002&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;br/&gt;
Apr 10 03:01:53 n1 kernel: Call Trace:&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa037c075&amp;gt;&amp;#93;&lt;/span&gt; ? libcfs_nid2str+0x155/0x160 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81055ad3&amp;gt;&amp;#93;&lt;/span&gt; ? __wake_up+0x53/0x70&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03ffc86&amp;gt;&amp;#93;&lt;/span&gt; ? lnet_send+0x6d6/0xb60 &lt;span class=&quot;error&quot;&gt;&amp;#91;lnet&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0400c33&amp;gt;&amp;#93;&lt;/span&gt; ? LNetPut+0x313/0x860 &lt;span class=&quot;error&quot;&gt;&amp;#91;lnet&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06319ca&amp;gt;&amp;#93;&lt;/span&gt; ? ptl_send_buf+0x12a/0x550 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03f8007&amp;gt;&amp;#93;&lt;/span&gt; ? LNetMDAttach+0x427/0x5a0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lnet&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0634f05&amp;gt;&amp;#93;&lt;/span&gt; ? ptl_send_rpc+0x565/0xc40 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa062c702&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_expire_one_request+0xa2/0x440 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa062d956&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_check_set+0xd26/0x1b20 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8109715c&amp;gt;&amp;#93;&lt;/span&gt; ? remove_wait_queue+0x3c/0x50&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa062e8b5&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_set_wait+0x165/0x8c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81063410&amp;gt;&amp;#93;&lt;/span&gt; ? default_wake_function+0x0/0x20&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa05ee0a0&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_work_gl_ast_lock+0x0/0x290 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa05f0f7b&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_run_ast_work+0x1db/0x490 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa061298b&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_glimpse_locks+0x3b/0x100 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0ea2096&amp;gt;&amp;#93;&lt;/span&gt; ? ofd_intent_policy+0x516/0x7d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ofd&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa05f0831&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_enqueue+0x361/0x8d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06171ef&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_handle_enqueue0+0x4ef/0x10b0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0617e16&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_handle_enqueue+0x66/0x70 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0617e20&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_server_completion_ast+0x0/0x6c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e54300&amp;gt;&amp;#93;&lt;/span&gt; ? ost_blocking_ast+0x0/0x10f0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0614830&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_server_glimpse_ast+0x0/0x3b0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0e5d318&amp;gt;&amp;#93;&lt;/span&gt; ? ost_handle+0x1db8/0x48e0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ost&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0640beb&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_update_export_timer+0x4b/0x560 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa06493c8&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_server_handle_request+0x398/0xc60 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa03705de&amp;gt;&amp;#93;&lt;/span&gt; ? cfs_timer_arm+0xe/0x10 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0381d9f&amp;gt;&amp;#93;&lt;/span&gt; ? lc_watchdog_touch+0x6f/0x170 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0640729&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_wait_event+0xa9/0x290 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81055ad3&amp;gt;&amp;#93;&lt;/span&gt; ? __wake_up+0x53/0x70&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa064a75e&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0xace/0x1700 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0649c90&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1700 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0ca&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0xa/0x20&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0649c90&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1700 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0649c90&amp;gt;&amp;#93;&lt;/span&gt; ? ptlrpc_main+0x0/0x1700 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100c0c0&amp;gt;&amp;#93;&lt;/span&gt; ? child_rip+0x0/0x20&lt;br/&gt;
Apr 10 03:01:53 n1 kernel: &lt;br/&gt;
Apr 10 03:01:53 n1 kernel: LustreError: dumping log to /tmp/lustre-log.1397113313.3118&lt;br/&gt;
Apr 10 03:02:07 n1 kernel: Lustre: lustre-OST0000: haven&apos;t heard from client 7511411c-dd05-5423-aa8a-098f78d9739a (at 11.0.0.32@tcp) in 231 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88007c62b400, cur 1397113327 expire 1397113177 last 1397113096&lt;br/&gt;
Apr 10 03:02:07 n1 kernel: LustreError: 3118:0:(client.c:1048:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88005dc90400 x1464979110756852/t0(0) o106-&amp;gt;lustre-OST0000@11.0.0.32@tcp:15/16 lens 296/280 e 0 to 1 dl 1397113334 ref 1 fl Rpc:X/2/ffffffff rc -5/-1&lt;br/&gt;
Apr 10 03:02:07 n1 kernel: LustreError: 3118:0:(ldlm_lockd.c:709:ldlm_handle_ast_error()) ### client (nid 11.0.0.32@tcp) returned 0 from glimpse AST ns: filter-lustre-OST0000_UUID lock: ffff88005dd47000/0x1151412baa60ebe9 lrc: 2/0,0 mode: -&lt;del&gt;/PW res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x8ba2:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0&lt;/del&gt;&amp;gt;81919) flags: 0x801000000000 nid: 11.0.0.32@tcp remote: 0xdcdb6436fc134adb expref: 2 pid: 3118 timeout: 0 lvb_type: 0&lt;br/&gt;
Apr 10 03:02:07 n1 kernel: LNet: Service thread pid 3118 completed after 213.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).&lt;br/&gt;
Apr 10 03:02:43 n1 kernel: Lustre: MGS: haven&apos;t heard from client 6abc5e98-73d8-60a9-2c7a-c9503f5979d6 (at 11.0.0.32@tcp) in 272 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88005dc6e800, cur 1397113363 expire 1397113213 last 1397113091&lt;/p&gt;</comment>
                            <comment id="187855" author="jamesanunez" created="Fri, 10 Mar 2017 18:22:06 +0000"  >&lt;p&gt;Alexander,&lt;br/&gt;
Are you still experiencing this data corruption? If you are, what version of Lustre are you using?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="193674" author="jamesanunez" created="Wed, 26 Apr 2017 20:02:08 +0000"  >&lt;p&gt;Please open a new ticket if you are still experiencing this problem.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw9wn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11775</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>