<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:37:17 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10683] write checksum errors</title>
                <link>https://jira.whamcloud.com/browse/LU-10683</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hiya,&lt;/p&gt;

&lt;p&gt;these appeared last night.&lt;br/&gt;
john100,101 are clients. arkle3,6 are OSS&apos;s. transom1 is running the fabric manager.&lt;/p&gt;

&lt;p&gt;in light of the similar looking &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9305&quot; title=&quot;Running File System Aging create write checksum errors &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9305&quot;&gt;&lt;del&gt;LU-9305&lt;/del&gt;&lt;/a&gt; I thought I would create this ticket.&lt;br/&gt;
we run default (4M) rpcs on clients and servers. our OSTs are each 4 raidz3&apos;s in 1 zpool, and have recordsize=2M. 2 OSTs per OSS. 16 OSTs total.&lt;/p&gt;

&lt;p&gt;I suppose it could be a OPA network glitch, but it affected 2 clients and 2 servers so that seems unlikely.&lt;/p&gt;

&lt;p&gt;we have just moved from zfs 0.7.5 to zfs 0.7.6. we ran ior and mdtest after this change and they were fine. these errors occurred a couple of days after that.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Feb 19 23:45:12 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum 54e81a18
Feb 19 23:45:12 john100 kernel: LNetError: 899:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-192.168.44.33@o2ib44, match 1591386600741136 length 1048576 too big: 1048176 left, 1048176 allowed
Feb 19 23:45:12 arkle3 kernel: LNet: Using FMR for registration
Feb 19 23:45:12 arkle3 kernel: LustreError: 298612:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff881764648600
Feb 19 23:45:12 arkle3 kernel: LustreError: 298612:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff881764648600
Feb 19 23:45:12 arkle3 kernel: LustreError: 337237:0:(ldlm_lib.c:3242:target_bulk_io()) @@@ network error on bulk WRITE  req@ffff881687561450 x1591386600741136/t0(0) o4-&amp;gt;8c8018f7-2e02-6c2b-cbcf-29133ecabf02@192.168.44.200@o2ib44:173/0 lens 608/448 e 0 to 0 dl 1519044318 ref 1 fl Interpret:/0/0 rc 0/0
Feb 19 23:45:12 arkle3 kernel: Lustre: dagg-OST0005: Bulk IO write error with 8c8018f7-2e02-6c2b-cbcf-29133ecabf02 (at 192.168.44.200@o2ib44), client will retry: rc = -110
Feb 19 23:45:12 arkle3 kernel: LNet: Skipped 1 previous similar message
Feb 19 23:45:12 john101 kernel: LNetError: 904:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-192.168.44.36@o2ib44, match 1591895282655776 length 1048576 too big: 1048176 left, 1048176 allowed
Feb 19 23:45:12 arkle6 kernel: LustreError: 298397:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff8810cb714600
Feb 19 23:45:12 arkle6 kernel: LustreError: 298397:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff8810cb714600
Feb 19 23:45:12 arkle6 kernel: LustreError: 42356:0:(ldlm_lib.c:3242:target_bulk_io()) @@@ network error on bulk WRITE  req@ffff88276c3d5850 x1591895282655776/t0(0) o4-&amp;gt;400cfa1c-7c7d-1d14-09ed-f6043574fd7c@192.168.44.201@o2ib44:173/0 lens 608/448 e 0 to 0 dl 1519044318 ref 1 fl Interpret:/0/0 rc 0/0
Feb 19 23:45:12 arkle6 kernel: Lustre: dagg-OST000b: Bulk IO write error with 400cfa1c-7c7d-1d14-09ed-f6043574fd7c (at 192.168.44.201@o2ib44), client will retry: rc = -110
Feb 19 23:45:12 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum 54e81a18 (type 4), client csum now efde5b36
Feb 19 23:45:12 john100 kernel: LustreError: 924:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff880e7a9b4e00 x1591386600740944/t197580821951(197580821951) o4-&amp;gt;dagg-OST0005-osc-ffff88015a1ad800@192.168.44.33@o2ib44:6/4 lens 608/416 e 0 to 0 dl 1519044319 ref 2 fl Interpret:RM/0/0 rc 0/0
Feb 19 23:45:13 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum df7fda9d
Feb 19 23:45:13 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum df7fda9d (type 4), client csum now efde5b36
Feb 19 23:45:13 john100 kernel: LustreError: 911:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff8807492fe300 x1591386600747696/t197580821955(197580821955) o4-&amp;gt;dagg-OST0005-osc-ffff88015a1ad800@192.168.44.33@o2ib44:6/4 lens 608/416 e 0 to 0 dl 1519044320 ref 2 fl Interpret:RM/0/0 rc 0/0
Feb 19 23:45:15 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum 87da008a
Feb 19 23:45:15 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum 87da008a (type 4), client csum now efde5b36
Feb 19 23:45:15 john100 kernel: LustreError: 910:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff880eb3d51b00 x1591386600751360/t197580821956(197580821956) o4-&amp;gt;dagg-OST0005-osc-ffff88015a1ad800@192.168.44.33@o2ib44:6/4 lens 608/416 e 0 to 0 dl 1519044322 ref 2 fl Interpret:RM/0/0 rc 0/0
Feb 19 23:45:18 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum 1cc7a793
Feb 19 23:45:18 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum 1cc7a793 (type 4), client csum now efde5b36
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;full log attached.&lt;br/&gt;
there were hung and unkillable user processes on the 2 clients afterwards. a reboot of the 2 clients has cleared up the looping messages of the type shown below.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Feb 20 14:28:15 arkle6 kernel: LustreError: 298395:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff882e25ef2c00
Feb 20 14:28:19 arkle3 kernel: LustreError: 298609:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff88176544be00
Feb 20 14:28:19 arkle3 kernel: LustreError: 298611:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff8814072e0200
Feb 20 14:28:19 arkle3 kernel: LustreError: 298609:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff88176544be00
Feb 20 14:28:19 arkle3 kernel: LustreError: 298611:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff8814072e0200
Feb 20 14:28:47 arkle6 kernel: LustreError: 298394:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff882e25ef3800
Feb 20 14:28:47 arkle6 kernel: LustreError: 298394:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff882e25ef3800
Feb 20 14:28:51 arkle3 kernel: LustreError: 298609:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff881615f0f000
Feb 20 14:28:51 arkle3 kernel: LustreError: 298610:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff880be3931c00
Feb 20 14:28:51 arkle3 kernel: LustreError: 298609:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff881615f0f000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>centos7, skylake, opa, zfs 0.7.6, lustre 2.10.3 + &lt;a href=&quot;https://review.whamcloud.com/#/c/29992/&quot;&gt;https://review.whamcloud.com/#/c/29992/&lt;/a&gt;. kernel nopti on servers, pti on clients, 3.10.0-693.17.1.el7.x86_64</environment>
        <key id="50865">LU-10683</key>
            <summary>write checksum errors</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Tue, 20 Feb 2018 04:40:09 +0000</created>
                <updated>Wed, 28 Nov 2018 17:12:51 +0000</updated>
                            <resolved>Tue, 24 Jul 2018 16:08:46 +0000</resolved>
                                    <version>Lustre 2.10.3</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                    <fixVersion>Lustre 2.10.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="221285" author="rjh" created="Tue, 20 Feb 2018 07:57:51 +0000"  >&lt;p&gt;the user running the code thinks that they went over group block quota when the above message occurred and the code hung. it was probably writing hdf5&apos;s.&lt;/p&gt;

&lt;p&gt;the zfs filesystem has compression turned on.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="221535" author="adilger" created="Fri, 23 Feb 2018 02:20:18 +0000"  >&lt;p&gt;While it is possible the quota issue is related, it should be that the client can finish writing data that it had cached before the quota was exceeded. &lt;/p&gt;

&lt;p&gt;It looks more like there was a problem with the bulk RPC that prevented the write from completing peoperly?  Two possible ways to mitigate this, and see if we can isolate the cause:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;reduce the ZFS record size to 1MB, since there has been relatively little testing with larger record size for 2.10&lt;/li&gt;
	&lt;li&gt;reduce RPC size to 1MB. Not that I think this is a likely culprit, since 4MB RPC size is fairly common, but just trying to reduce variables&lt;/li&gt;
	&lt;li&gt;disable compression on the dataset temporarily&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;The &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9305&quot; title=&quot;Running File System Aging create write checksum errors &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9305&quot;&gt;&lt;del&gt;LU-9305&lt;/del&gt;&lt;/a&gt; bug was fixed just before the 2.10 release, but you should already have that fix in 2.10.3. &lt;/p&gt;</comment>
                            <comment id="221545" author="pjones" created="Fri, 23 Feb 2018 05:12:23 +0000"  >&lt;p&gt;Nathaniel&lt;/p&gt;

&lt;p&gt;Anything to add on this one?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="221635" author="scadmin" created="Mon, 26 Feb 2018 03:42:03 +0000"  >&lt;p&gt;I asked the user to run the code again and go over quota in a similar way, and there was nothing triggered from lustre this time. so unfortunately this may be hard to reproduce.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="223372" author="utopiabound" created="Mon, 12 Mar 2018 21:37:01 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=scadmin&quot; class=&quot;user-hover&quot; rel=&quot;scadmin&quot;&gt;scadmin&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Do you have full messages files from both the clients and servers?  In he messages file, it looks like only a single file is having this issue FID:0x20000c02e:0x9ef:0x0&lt;/p&gt;

&lt;p&gt;To find the file (on a client):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs fid2path /PATH/TO/LUSTRE 0x20000c02e:0x9ef:0x0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="223479" author="rjh" created="Tue, 13 Mar 2018 06:02:52 +0000"  >&lt;p&gt;Hi Nathaniel,&lt;/p&gt;

&lt;p&gt;you mean more messages in than the attachment above? if you&apos;d like a few hours before and after then I can do that... let me know. we also have conman (console) logs if you&apos;d like, but I&apos;m not sure there&apos;s any more info in those.&lt;/p&gt;

&lt;p&gt;are parallel writes to zfs which exceed group quotas part of the lustre test suite y&apos;all run things through?&lt;/p&gt;

&lt;p&gt;looks like the file is still there...&lt;br/&gt;
we play some bind mount tricks to mount subdirs and fid2path wasn&apos;t happy with those, but after I mounted the root of the fs directly on a tmp mountpoint then it it worked ok (+/- the // in the filename?) -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john5 ~]# lfs fid2path /tmp/dagg 0x20000c02e:0x9ef:0x0
/tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5

[root@john5 ~]# ls -l /tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5
-rw-rw-r-- 1 &amp;lt;username&amp;gt; oz009 1194134864 Feb 21 19:54 /tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5

[root@john5 ~]# ls -lsh /tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5
915M -rw-rw-r-- 1 &amp;lt;username&amp;gt; oz009 1.2G Feb 21 19:54 /tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5

[root@john5 ~]# stat /tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5
 File: &#8216;/tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5&#8217;
 Size: 1194134864 Blocks: 1872277 IO Block: 4194304 regular file
Device: ef57e2ach/4015514284d Inode: 144116013481331183 Links: 1
Access: (0664/-rw-rw-r--) Uid: (10056/ &amp;lt;username&amp;gt;) Gid: (10204/ oz009)
Access: 2018-03-11 13:32:32.000000000 +1100
Modify: 2018-02-21 19:54:15.000000000 +1100
Change: 2018-03-05 16:53:29.000000000 +1100
 Birth: -

[root@john5 ~]# lfs getstripe /tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5
/tmp/dagg/projects/oz009/N1024//snapshot_100.7.hdf5
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 5
 obdidx objid objid group
 5 2683981 0x28f44d 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="223511" author="utopiabound" created="Tue, 13 Mar 2018 14:18:20 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=rjh&quot; class=&quot;user-hover&quot; rel=&quot;rjh&quot;&gt;rjh&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;I&apos;m trying to figure out if it&apos;s just that file, or if other files are affected. So the messages file for a couple hours or even days would be useful.&lt;/p&gt;</comment>
                            <comment id="223616" author="rjh" created="Wed, 14 Mar 2018 16:01:17 +0000"  >&lt;p&gt;Hi Nathaniel,&lt;/p&gt;

&lt;p&gt;hmm, I just grep&apos;d a bit more and this is worrying. now there are read and write checksum errors, and also to our small /home OSTs (OSS&apos;s are umlaut1,2, which have zfs recordsize 1M and compression on.)&lt;/p&gt;

&lt;p&gt;we have a lot of logspam at the moment from various things so I missed these &apos;til now &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;can I give you (via email or something) a URL to download complete logs from? I don&apos;t want to put them here &apos;cos they have usernames etc. in them.&lt;/p&gt;

&lt;p&gt;all lustre server are now running zfs 0.7.6, and all servers and clients still lustre 2.10.3.&lt;br/&gt;
 clients on kernel 3.10.0-693.21.1.el7.x86_64, servers on 3.10.0-693.17.1.el7.x86_64&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;/var/log/messages-20180220.gz:Feb 19 23:45:12 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum 54e81a18
/var/log/messages-20180220.gz:Feb 19 23:45:12 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum 54e81a18 (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:45:13 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum df7fda9d
/var/log/messages-20180220.gz:Feb 19 23:45:13 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum df7fda9d (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:45:15 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum 87da008a
/var/log/messages-20180220.gz:Feb 19 23:45:15 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum 87da008a (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:45:18 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum 1cc7a793
/var/log/messages-20180220.gz:Feb 19 23:45:18 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum 1cc7a793 (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:45:22 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum a7de2e17
/var/log/messages-20180220.gz:Feb 19 23:45:23 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum a7de2e17 (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:45:35 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum a7de2e17
/var/log/messages-20180220.gz:Feb 19 23:45:35 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum a7de2e17 (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:45:53 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111]: client csum efde5b36, server csum a7de2e17
/var/log/messages-20180220.gz:Feb 19 23:45:54 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [331258364-335450111], original client csum efde5b36 (type 4), server csum a7de2e17 (type 4), client csum now efde5b36
/var/log/messages-20180220.gz:Feb 19 23:46:31 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [358060376-362254335]: client csum 164c6d3b, server csum c5bdd26c
/var/log/messages-20180220.gz:Feb 19 23:46:32 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [358060376-362254335], original client csum 164c6d3b (type 4), server csum c5bdd26c (type 4), client csum now 164c6d3b
/var/log/messages-20180220.gz:Feb 19 23:47:38 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [389517656-393711615]: client csum f0550656, server csum ea7a06d9
/var/log/messages-20180220.gz:Feb 19 23:47:38 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [389517656-393711615], original client csum f0550656 (type 4), server csum ea7a06d9 (type 4), client csum now f0550656
/var/log/messages-20180220.gz:Feb 19 23:49:51 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [447804664-451997695]: client csum f6a340ce, server csum 62064124
/var/log/messages-20180220.gz:Feb 19 23:49:53 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [447804664-451997695], original client csum f6a340ce (type 4), server csum 62064124 (type 4), client csum now f6a340ce
/var/log/messages-20180220.gz:Feb 19 23:54:11 arkle3 kernel: LustreError: 168-f: dagg-OST0005: BAD WRITE CHECKSUM: from 12345-192.168.44.200@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [599407780-603598847]: client csum 2a11218d, server csum 5539349b
/var/log/messages-20180220.gz:Feb 19 23:54:12 john100 kernel: LustreError: 132-0: dagg-OST0005-osc-ffff88015a1ad800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.33@o2ib44 inode [0x20000c02e:0x9ef:0x0] object 0x0:2683981 extent [599407780-603598847], original client csum 2a11218d (type 4), server csum 5539349b (type 4), client csum now 2a11218d
/var/log/messages-20180223.gz:Feb 22 17:13:04 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum 8dd68c18
/var/log/messages-20180223.gz:Feb 22 17:13:04 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum 8dd68c18 (type 4), client csum now c2abd616
/var/log/messages-20180223.gz:Feb 22 17:13:05 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum 8dd68c18
/var/log/messages-20180223.gz:Feb 22 17:13:05 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum 8dd68c18 (type 4), client csum now c2abd616
/var/log/messages-20180223.gz:Feb 22 17:13:07 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum c1141035
/var/log/messages-20180223.gz:Feb 22 17:13:07 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum c1141035 (type 4), client csum now c2abd616
/var/log/messages-20180223.gz:Feb 22 17:13:10 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum 3e076abf
/var/log/messages-20180223.gz:Feb 22 17:13:11 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum 3e076abf (type 4), client csum now c2abd616
/var/log/messages-20180223.gz:Feb 22 17:13:15 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum 556fb060
/var/log/messages-20180223.gz:Feb 22 17:13:15 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum 556fb060 (type 4), client csum now c2abd616
/var/log/messages-20180223.gz:Feb 22 17:13:26 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum 8dd68c18
/var/log/messages-20180223.gz:Feb 22 17:13:26 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum 8dd68c18 (type 4), client csum now c2abd616
/var/log/messages-20180223.gz:Feb 22 17:13:44 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.13@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343]: client csum c2abd616, server csum d623d4d5
/var/log/messages-20180223.gz:Feb 22 17:13:44 farnarkle1 kernel: LustreError: 132-0: home-OST0000-osc-ffff88189a352800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x2000032e9:0x8a9c:0x0] object 0x0:10896796 extent [7152-13343], original client csum c2abd616 (type 4), server csum d623d4d5 (type 4), client csum now c2abd616
/var/log/messages-20180308.gz:Mar  7 15:03:06 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x44:0x0] object 0x0:11031820 extent [516-4611]: client csum 9977a425, server csum 5747b5ea
/var/log/messages-20180308.gz:Mar  7 15:03:06 john72 kernel: LustreError: 132-0: home-OST0001-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004296:0x44:0x0] object 0x0:11031820 extent [516-4611], original client csum 9977a425 (type 4), server csum 5747b5ea (type 4), client csum now 9977a425
/var/log/messages-20180308.gz:Mar  7 15:03:07 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x44:0x0] object 0x0:11031820 extent [4620-8715]: client csum 5ef5f7db, server csum bf44f4ab
/var/log/messages-20180308.gz:Mar  7 15:03:07 john72 kernel: LustreError: 132-0: home-OST0001-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004296:0x44:0x0] object 0x0:11031820 extent [4620-8715], original client csum 5ef5f7db (type 4), server csum bf44f4ab (type 4), client csum now 5ef5f7db
/var/log/messages-20180308.gz:Mar  7 15:03:16 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x45:0x0] object 0x0:11014864 extent [516-4611]: client csum 1dcd24ad, server csum 3a01165
/var/log/messages-20180308.gz:Mar  7 15:03:16 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x45:0x0] object 0x0:11014864 extent [516-4611], original client csum 1dcd24ad (type 4), server csum 3a01165 (type 4), client csum now 1dcd24ad
/var/log/messages-20180308.gz:Mar  7 15:03:17 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x45:0x0] object 0x0:11014864 extent [4620-8715]: client csum 27a49363, server csum 799787cb
/var/log/messages-20180308.gz:Mar  7 15:04:07 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x46:0x0] object 0x0:11014870 extent [516-4611]: client csum 8a6d82ff, server csum c06e206e
/var/log/messages-20180308.gz:Mar  7 15:04:07 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x46:0x0] object 0x0:11014870 extent [516-4611], original client csum 8a6d82ff (type 4), server csum c06e206e (type 4), client csum now 8a6d82ff
/var/log/messages-20180308.gz:Mar  7 15:04:50 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x47:0x0] object 0x0:11014876 extent [516-4611]: client csum e76922f6, server csum ad6a8067
/var/log/messages-20180308.gz:Mar  7 15:04:50 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x47:0x0] object 0x0:11014876 extent [516-4611], original client csum e76922f6 (type 4), server csum ad6a8067 (type 4), client csum now e76922f6
/var/log/messages-20180308.gz:Mar  7 15:05:14 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x48:0x0] object 0x0:11031833 extent [516-4611]: client csum c606d023, server csum e53f9b17
/var/log/messages-20180308.gz:Mar  7 15:05:14 john72 kernel: LustreError: 132-0: home-OST0001-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004296:0x48:0x0] object 0x0:11031833 extent [516-4611], original client csum c606d023 (type 4), server csum e53f9b17 (type 4), client csum now c606d023
/var/log/messages-20180308.gz:Mar  7 15:05:24 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x49:0x0] object 0x0:11014877 extent [516-4611]: client csum d80dfda8, server csum 89d05138
/var/log/messages-20180308.gz:Mar  7 15:07:44 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x4a:0x0] object 0x0:11014894 extent [516-4611]: client csum 60a143e6, server csum 2aa2e177
/var/log/messages-20180308.gz:Mar  7 15:07:44 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x4a:0x0] object 0x0:11014894 extent [516-4611], original client csum 60a143e6 (type 4), server csum 2aa2e177 (type 4), client csum now 60a143e6
/var/log/messages-20180308.gz:Mar  7 15:15:05 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x4b:0x0] object 0x0:11014906 extent [516-4611]: client csum ff1728a9, server csum 8ba1a1e0
/var/log/messages-20180308.gz:Mar  7 15:15:05 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x4b:0x0] object 0x0:11014906 extent [516-4611], original client csum ff1728a9 (type 4), server csum 8ba1a1e0 (type 4), client csum now ff1728a9
/var/log/messages-20180308.gz:Mar  7 15:15:16 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x4c:0x0] object 0x0:11031863 extent [516-4611]: client csum cca2b223, server csum 3a7a9a4e
/var/log/messages-20180308.gz:Mar  7 15:15:23 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x4e:0x0] object 0x0:11031864 extent [516-4611]: client csum ee48cab1, server csum 6f69a9ca
/var/log/messages-20180308.gz:Mar  7 15:15:43 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x50:0x0] object 0x0:11031865 extent [516-4611]: client csum c8bdaad3, server csum 82be0842
/var/log/messages-20180308.gz:Mar  7 15:15:46 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x51:0x0] object 0x0:11014909 extent [516-4611]: client csum 57497f3, server csum 4f773562
/var/log/messages-20180308.gz:Mar  7 15:16:21 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x52:0x0] object 0x0:11031866 extent [516-4611]: client csum 179adff1, server csum d885540d
/var/log/messages-20180308.gz:Mar  7 15:16:21 john72 kernel: LustreError: 132-0: home-OST0001-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004296:0x52:0x0] object 0x0:11031866 extent [516-4611], original client csum 179adff1 (type 4), server csum d885540d (type 4), client csum now 179adff1
/var/log/messages-20180308.gz:Mar  7 15:19:50 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x59:0x0] object 0x0:11014910 extent [516-4611]: client csum 98a90fb3, server csum d2aaad22
/var/log/messages-20180308.gz:Mar  7 15:19:50 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x59:0x0] object 0x0:11014910 extent [516-4611], original client csum 98a90fb3 (type 4), server csum d2aaad22 (type 4), client csum now 98a90fb3
/var/log/messages-20180308.gz:Mar  7 15:20:08 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x5d:0x0] object 0x0:11031867 extent [516-4611]: client csum 34fda7cb, server csum e078ee98
/var/log/messages-20180308.gz:Mar  7 15:21:15 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x64:0x0] object 0x0:11031870 extent [516-4611]: client csum 24b13698, server csum 6eb29409
/var/log/messages-20180308.gz:Mar  7 15:22:56 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x67:0x0] object 0x0:11014915 extent [516-4611]: client csum fcb4e51c, server csum ffb2a769
/var/log/messages-20180308.gz:Mar  7 15:23:50 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x6a:0x0] object 0x0:11031873 extent [516-4611]: client csum a1a66091, server csum 320a1aef
/var/log/messages-20180308.gz:Mar  7 15:25:07 john72 kernel: LustreError: 132-0: home-OST0001-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004296:0x6e:0x0] object 0x0:11031875 extent [516-4611], original client csum 4a4e93eb (type 4), server csum fe6a3a5f (type 4), client csum now 4a4e93eb
/var/log/messages-20180308.gz:Mar  7 15:27:16 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x75:0x0] object 0x0:11014922 extent [516-4611]: client csum 629c147f, server csum 289fb6ee
/var/log/messages-20180308.gz:Mar  7 15:30:39 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x88:0x0] object 0x0:11031881 extent [516-4611]: client csum b0252143, server csum fa2683d2
/var/log/messages-20180308.gz:Mar  7 15:51:08 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.172@o2ib44 inode [0x200004296:0x89:0x0] object 0x0:11014925 extent [516-4611]: client csum f7b9952b, server csum 539602f8
/var/log/messages-20180308.gz:Mar  7 15:51:08 john72 kernel: LustreError: 132-0: home-OST0000-osc-ffff8817d9a20800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004296:0x89:0x0] object 0x0:11014925 extent [516-4611], original client csum f7b9952b (type 4), server csum 539602f8 (type 4), client csum now f7b9952b
/var/log/messages-20180308.gz:Mar  7 17:44:25 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.173@o2ib44 inode [0x200004a85:0x1d:0x0] object 0x0:11015246 extent [516-4611]: client csum c6b24442, server csum bb49caf9
/var/log/messages-20180308.gz:Mar  7 17:44:25 john73 kernel: LustreError: 132-0: home-OST0000-osc-ffff882fd67d1800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004a85:0x1d:0x0] object 0x0:11015246 extent [516-4611], original client csum c6b24442 (type 4), server csum bb49caf9 (type 4), client csum now c6b24442
/var/log/messages-20180308.gz:Mar  7 17:44:27 john73 kernel: LustreError: 132-0: home-OST0000-osc-ffff882fd67d1800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004a85:0x1d:0x0] object 0x0:11015246 extent [4620-8715], original client csum bfce872c (type 4), server csum 850e6171 (type 4), client csum now bfce872c
/var/log/messages-20180308.gz:Mar  7 18:00:30 umlaut2 kernel: LustreError: 168-f: home-OST0001: BAD WRITE CHECKSUM: from 12345-192.168.44.173@o2ib44 inode [0x200004a85:0x20:0x0] object 0x0:11032280 extent [516-4611]: client csum 941ab186, server csum 43900b50
/var/log/messages-20180308.gz:Mar  7 18:00:30 john73 kernel: LustreError: 132-0: home-OST0001-osc-ffff882fd67d1800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004a85:0x20:0x0] object 0x0:11032280 extent [516-4611], original client csum 941ab186 (type 4), server csum 43900b50 (type 4), client csum now 941ab186
/var/log/messages-20180308.gz:Mar  7 18:00:31 john73 kernel: LustreError: 132-0: home-OST0001-osc-ffff882fd67d1800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004a85:0x20:0x0] object 0x0:11032280 extent [4620-8715], original client csum 4da5042f (type 4), server csum 1a0af759 (type 4), client csum now 4da5042f
/var/log/messages-20180308.gz:Mar  7 18:00:42 umlaut1 kernel: LustreError: 168-f: home-OST0000: BAD WRITE CHECKSUM: from 12345-192.168.44.173@o2ib44 inode [0x200004a85:0x21:0x0] object 0x0:11015326 extent [516-4611]: client csum 51cfa889, server csum 308c8cb3
/var/log/messages-20180308.gz:Mar  7 18:00:42 john73 kernel: LustreError: 132-0: home-OST0000-osc-ffff882fd67d1800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.51@o2ib44 inode [0x200004a85:0x21:0x0] object 0x0:11015326 extent [516-4611], original client csum 51cfa889 (type 4), server csum 308c8cb3 (type 4), client csum now 51cfa889
/var/log/messages-20180308.gz:Mar  7 18:00:59 john73 kernel: LustreError: 132-0: home-OST0001-osc-ffff882fd67d1800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 192.168.44.52@o2ib44 inode [0x200004a85:0x22:0x0] object 0x0:11032283 extent [29924-72851], original client csum 8d6ddb8c (type 4), server csum c3ce8506 (type 4), client csum now 8d6ddb8c
/var/log/messages-20180314.gz:Mar 13 15:23:06 john57 kernel: LustreError: 133-1: dagg-OST0004-osc-ffff881899dd5800: BAD READ CHECKSUM: from 192.168.44.33@o2ib44 inode [0x28001c1e4:0x128:0x0] object 0x540000400:8139196 extent [0-4095], client 9187de52, server 3ea5bb6b, cksum_type 4
/var/log/messages-20180314.gz:Mar 13 15:23:06 john56 kernel: LustreError: 133-1: dagg-OST0003-osc-ffff8817c5166000: BAD READ CHECKSUM: from 192.168.44.32@o2ib44 inode [0x28001c1c2:0x16c:0x0] object 0x500000400:8116043 extent [0-4095], client 37af0280, server a79ccd9f, cksum_type 4
/var/log/messages-20180314.gz:Mar 13 15:23:06 john63 kernel: LustreError: 133-1: dagg-OST000c-osc-ffff88189a274000: BAD READ CHECKSUM: from 192.168.44.37@o2ib44 inode [0x28001c1c4:0x172:0x0] object 0x400000400:8173179 extent [0-4095], client 8cdf64cc, server 4df3e0bb, cksum_type 4
/var/log/messages-20180314.gz:Mar 13 15:23:07 arkle7 kernel: LustreError: 132-0: dagg-OST000c: BAD READ CHECKSUM: should have changed on the client or in transit: from 192.168.44.163@o2ib44 inode [0x28001c1c4:0x172:0x0] object 0x400000400:8173179 extent [0-4095], client returned csum 8cdf64cc (type 4), server csum 4df3e0bb (type 4)
/var/log/messages-20180314.gz:Mar 13 15:23:07 arkle3 kernel: LustreError: 132-0: dagg-OST0004: BAD READ CHECKSUM: should have changed on the client or in transit: from 192.168.44.157@o2ib44 inode [0x28001c1e4:0x128:0x0] object 0x540000400:8139196 extent [0-4095], client returned csum 9187de52 (type 4), server csum 3ea5bb6b (type 4)
/var/log/messages-20180314.gz:Mar 13 15:23:07 arkle2 kernel: LustreError: 132-0: dagg-OST0003: BAD READ CHECKSUM: should have changed on the client or in transit: from 192.168.44.156@o2ib44 inode [0x28001c1c2:0x16c:0x0] object 0x500000400:8116043 extent [0-4095], client returned csum 37af0280 (type 4), server csum a79ccd9f (type 4)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;cheers,&lt;br/&gt;
 robin&lt;/p&gt;</comment>
                            <comment id="223625" author="utopiabound" created="Wed, 14 Mar 2018 17:51:47 +0000"  >&lt;p&gt;Below are instructions for uploading logs to our write-only ftp site:&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Sometimes the diagnostic data collected as part of Lustre troubleshooting is too large to be attached to a JIRA ticket. For these cases, HPDD provides an anonymous write-only FTP upload service. In order to use this service, you&apos;ll need an FTP client (e.g. ncftp, ftp, etc.) and a JIRA issue. Use the &apos;uploads&apos; directory and create a new subdirectory using your Jira issue as a name.&lt;/p&gt;

&lt;p&gt;In the following example, there are three debug logs in a single directory and the JIRA issue &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4242&quot; title=&quot;mdt_open.c:1685:mdt_reint_open()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4242&quot;&gt;&lt;del&gt;LU-4242&lt;/del&gt;&lt;/a&gt; has been created. After completing the upload, please update the relevant issue with a note mentioning the upload, so that our engineers know where to find your logs.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ ls -lh
total 333M
-rw-r--r-- 1 mjmac mjmac  98M Feb 23 17:36 mds-debug
-rw-r--r-- 1 mjmac mjmac 118M Feb 23 17:37 oss-00-debug
-rw-r--r-- 1 mjmac mjmac 118M Feb 23 17:37 oss-01-debug
$ ncftp ftp.hpdd.intel.com
NcFTP 3.2.2 (Sep 04, 2008) by Mike Gleason (http://www.NcFTP.com/contact/).
Connecting to 99.96.190.235...
(vsFTPd 2.2.2)
Logging in...
Login successful.
Logged in to ftp.hpdd.intel.com.
ncftp / &amp;gt; cd uploads
Directory successfully changed.
ncftp /uploads &amp;gt; mkdir LU-4242
ncftp /uploads &amp;gt; cd LU-4242
Directory successfully changed.
ncftp /uploads/LU-4242 &amp;gt; put *
mds-debug:                                              97.66 MB   11.22 MB/s
oss-00-debug:                                          117.19 MB   11.16 MB/s
oss-01-debug:                                          117.48 MB   11.18 MB/s
ncftp /uploads/LU-4242 &amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Please note that this is a WRITE-ONLY FTP service, so you will not be able to see (with ls) the files or directories you&apos;ve created, nor will you (or anyone other than HPDD staff) be able to see or read them.&lt;/p&gt;</comment>
                            <comment id="223735" author="rjh" created="Thu, 15 Mar 2018 14:12:10 +0000"  >&lt;p&gt;Hi Nathaniel,&lt;/p&gt;

&lt;p&gt;all messages for the lustre servers and from the 7 clients affected so far for all of 2018 have been uploaded.&lt;br/&gt;
grep for CHECKSUM&lt;/p&gt;

&lt;p&gt;if you&apos;d like console logs from these machines too then please let us know.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="223814" author="rjh" created="Fri, 16 Mar 2018 04:15:39 +0000"  >&lt;p&gt;a bunch more CHECKSUM and LNet errors today. this lot were again definitely associated with over quota. I don&apos;t know if all the incidents are though...&lt;/p&gt;

&lt;p&gt;I guess I&apos;d be very worried if these weren&apos;t purely over quota events, which is why the read checksum messages were very alarming.&lt;/p&gt;

&lt;p&gt;any thoughts on whether these are just from quota events or not? is there any way we can easily tell that?&lt;/p&gt;

&lt;p&gt;I&apos;ll attach messages for today&apos;s errors to this ticket in a min.&lt;/p&gt;

&lt;p&gt;the user has many (~1100 so far) of the below in their job output. looks like they have ~127 write processes across their 28 nodes and 896 cores. the code is looping on the nodes trying to complete the writes.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;HDF5-DIAG: Error detected in HDF5 (1.10.1) MPI-process 168:
  #000: H5Dio.c line 269 in H5Dwrite(): can&apos;t prepare for writing data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 345 in H5D__pre_write(): can&apos;t write data
    major: Dataset
    minor: Write failed
  #002: H5Dio.c line 791 in H5D__write(): can&apos;t write data
    major: Dataset
    minor: Write failed
  #003: H5Dcontig.c line 642 in H5D__contig_write(): contiguous write failed
    major: Dataset
    minor: Write failed
  #004: H5Dselect.c line 309 in H5D__select_write(): write error
    major: Dataspace
    minor: Write failed
  #005: H5Dselect.c line 220 in H5D__select_io(): write error
    major: Dataspace
    minor: Write failed
  #006: H5Dcontig.c line 1267 in H5D__contig_writevv(): can&apos;t perform vectorized sieve buffer write
    major: Dataset
    minor: Can&apos;t operate on object
  #007: H5VM.c line 1500 in H5VM_opvv(): can&apos;t perform operation
    major: Internal error (too specific to document in detail)
    minor: Can&apos;t operate on object
  #008: H5Dcontig.c line 1014 in H5D__contig_writevv_sieve_cb(): block write failed
    major: Dataset
    minor: Write failed
  #009: H5Fio.c line 195 in H5F_block_write(): write through page buffer failed
    major: Low-level I/O
    minor: Write failed
  #010: H5PB.c line 1041 in H5PB_write(): write through metadata accumulator failed
    major: Page Buffering
    minor: Write failed
  #011: H5Faccum.c line 834 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #012: H5FDint.c line 308 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #013: H5FDsec2.c line 810 in H5FD_sec2_write(): file write failed: time = Fri Mar 16 03:17:20 2018
, filename = &apos;./L35_N2650/snapshot_050.24.hdf5&apos;, file descriptor = 24, errno = 122, error message = &apos;Disk quota exceeded&apos;, buf = 0x2b07c54a3010, total write size = 31457280, bytes this sub-write = 31457280, bytes actually written = 184467440737052
    major: Low-level I/O
    minor: Write failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224527" author="pjones" created="Mon, 26 Mar 2018 12:38:23 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Can you please advise with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="224627" author="adilger" created="Tue, 27 Mar 2018 17:34:42 +0000"  >&lt;p&gt;Hi Robin, did you reduce the &lt;tt&gt;recordsize&lt;/tt&gt; to 1MB on the filesystem?  We haven&apos;t done any testing ourselves with larger recordsize. The clients would need to be remounted to also get a smaller RPC size (they default to RPC size == max blocksize for ZFS at mount).&lt;/p&gt;

&lt;p&gt;Also, are you using TID RDMA (&lt;tt&gt;cap_mask=....&lt;/tt&gt;) for your OPA connection?  We&apos;ve seen problems with that under load, and if yes it should be disabled. &lt;/p&gt;</comment>
                            <comment id="224628" author="green" created="Tue, 27 Mar 2018 17:36:43 +0000"  >&lt;p&gt;What is the hdf5 library (I assume by the H5 prefix) do you use for this io (if any), does it use direct IO internally by any chance?&lt;/p&gt;</comment>
                            <comment id="224683" author="rjh" created="Wed, 28 Mar 2018 04:45:09 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;on our big /dagg filesystem I left the zfs recordsize at 2MB. it&apos;s a significant performance loss to set it to 1M - halves the size of all the i/o&apos;s to disks.&lt;/p&gt;

&lt;p&gt;however these events also happen to the /home lustre filesystem where the zfs recordsize is 1M and always has been. those are in the logs above.&lt;/p&gt;

&lt;p&gt;oh, damn, looks like /sys/module/zfs/parameters/zfs_max_recordsize=2M for these /home filesystems though, even though nothing uses that. that must be a left over from testing.&lt;/p&gt;

&lt;p&gt;I haven&apos;t tried to change the RPC size for anything, sorry. all the lustre filesystems in this cluster seem to be using max_pages_per_rpc=1024. is that the right number to look at? we haven&apos;t tweaked anything around that. I just assumed 4M was the default these days.&lt;/p&gt;

&lt;p&gt;we aren&apos;t using any cap_mask= options for hfi1. on clients it is&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;options hfi1 sge_copy_mode=2 krcvqs=4 piothreshold=0 wss_threshold=70 max_mtu=10240 eager_buffer_size=4194304
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and on lustre servers it&apos;s&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;options hfi1 sge_copy_mode=2 krcvqs=8 piothreshold=0 wss_threshold=70
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;with max_mtu=10240 being a driver default, and the default eager_buffer_size is 2097152. but AFAIK lustre&apos;s verbs doesn&apos;t use that eager_buffer stuff - it&apos;s for PSM2 comms I think.&lt;/p&gt;

&lt;p&gt;I actually don&apos;t know what most of these options do. it&apos;s just what various Intel people and docs told us was good.&lt;br/&gt;
only krcvqs seems to impact performance (&amp;lt;=half  a socket seems best), so we&apos;ve tuned that a bit.&lt;/p&gt;

&lt;p&gt;Oleg - yes it&apos;s a hdf5 library. we&apos;re looking into the hdf5 code and will try to communicate with the user and see what options they used when calling it in parallel. I suspect it uses whatever MPIIO uses, but haven&apos;t looked at either of those in years.&lt;/p&gt;

&lt;p&gt;thanks for looking into this.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="225583" author="ashehata" created="Tue, 10 Apr 2018 02:19:09 +0000"  >&lt;p&gt;Is it possible to turn on net logging and capture logs for a short period of time when this happens:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl set_param debug=+&lt;span class=&quot;code-quote&quot;&gt;&quot;net neterror&quot;&lt;/span&gt;
lctl dk &amp;gt; log&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="225592" author="scadmin" created="Tue, 10 Apr 2018 07:04:50 +0000"  >&lt;p&gt;Hi Amir,&lt;/p&gt;

&lt;p&gt;the incidents are rare. the last one was 3 days ago. probably over-quota related. we&apos;ve tried to reproduce them but haven&apos;t managed to with simple dd&apos;s etc.&lt;/p&gt;

&lt;p&gt;I guess I could turn on &apos;net neterror&apos; debug and write a script to tail syslog and automatically run dk&apos;s when it sees the next burst of CHECKSUM&apos;s.&lt;br/&gt;
does that sound ok?&lt;br/&gt;
do you want dk just from servers or clients too?&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="225617" author="ashehata" created="Tue, 10 Apr 2018 14:44:02 +0000"  >&lt;p&gt;Hi Robin,&lt;/p&gt;

&lt;p&gt;From both client and server would be great, separated into two files respectively.&lt;/p&gt;

&lt;p&gt;thanks&lt;br/&gt;
amir&lt;/p&gt;</comment>
                            <comment id="225873" author="scadmin" created="Thu, 12 Apr 2018 15:25:47 +0000"  >&lt;p&gt;Hi Amir,&lt;/p&gt;

&lt;p&gt;I setup a script to tail syslog and run dk on anything that hits a CHECKSUM error.&lt;br/&gt;
however it doesn&apos;t appear safe to turn on +net.&lt;br/&gt;
I&apos;ve seen 132 stack traces like these across all the servers and clients since I turned on +net this afternoon, so I&apos;ve now turned it off.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Apr 12 18:57:25 arkle5 kernel: ------------[ cut here ]------------
Apr 12 18:57:25 arkle5 kernel: WARNING: CPU: 1 PID: 127223 at kernel/softirq.c:151 __local_bh_enable_ip+0x82/0xb0
Apr 12 18:57:25 arkle5 kernel: Modules linked in: sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ip6table_filter ip6_tables iptable_filter osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_
zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) mptctl mptbase 8021q garp mrp stp llc hfi1 sunrpc xfs dm_round_robin dcdbas int
el_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr iTCO_wdt iTCO_vendor_support zfs(POE) zunic
ode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) mgag200 ttm dm_multipath drm_kms_helper ses syscopyarea enclosure dm_mod sysfillrect sysimgblt fb_sys_fops drm mei_me lpc_ich shpchp i2c_i801 sg
 mei
Apr 12 18:57:25 arkle5 kernel: ipmi_si ipmi_devintf nfit ipmi_msghandler libnvdimm acpi_power_meter tpm_crb rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm binfmt_misc ip_tables ib_ipoib ib_cm sr_mod cdrom 
sd_mod crc_t10dif crct10dif_generic bonding bnx2x rdmavt ahci i2c_algo_bit libahci crct10dif_pclmul mpt3sas mdio crct10dif_common i2c_core crc32c_intel ptp raid_class ib_core libata megaraid_sas scsi_transp
ort_sas pps_core libcrc32c [last unloaded: hfi1]
Apr 12 18:57:25 arkle5 kernel: CPU: 1 PID: 127223 Comm: hfi1_cq0 Tainted: P           OE  ------------   3.10.0-693.17.1.el7.x86_64 #1
Apr 12 18:57:25 arkle5 kernel: Hardware name: Dell Inc. PowerEdge R740/06G98X, BIOS 1.3.7 02/08/2018
Apr 12 18:57:25 arkle5 kernel: Call Trace:
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff816a6071&amp;gt;] dump_stack+0x19/0x1b
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810895e8&amp;gt;] __warn+0xd8/0x100
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff8108972d&amp;gt;] warn_slowpath_null+0x1d/0x20
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff81091be2&amp;gt;] __local_bh_enable_ip+0x82/0xb0
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff816ade8e&amp;gt;] _raw_spin_unlock_bh+0x1e/0x20
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffffc06183b5&amp;gt;] cfs_trace_unlock_tcd+0x55/0x90 [libcfs]
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffffc0623708&amp;gt;] libcfs_debug_vmsg2+0x6d8/0xb40 [libcfs]
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810cfb6c&amp;gt;] ? dequeue_entity+0x11c/0x5d0
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810c95d5&amp;gt;] ? sched_clock_cpu+0x85/0xc0
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff8102954d&amp;gt;] ? __switch_to+0xcd/0x500
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffffc0623bc7&amp;gt;] libcfs_debug_msg+0x57/0x80 [libcfs]
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffffc069682a&amp;gt;] kiblnd_cq_completion+0x11a/0x160 [ko2iblnd]
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffffc03ab4a2&amp;gt;] send_complete+0x32/0x50 [rdmavt]
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810b2ac0&amp;gt;] kthread_worker_fn+0x80/0x180
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810b2a40&amp;gt;] ? kthread_stop+0xe0/0xe0
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810b270f&amp;gt;] kthread+0xcf/0xe0
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810b2640&amp;gt;] ? insert_kthread_work+0x40/0x40
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff816b8798&amp;gt;] ret_from_fork+0x58/0x90
Apr 12 18:57:25 arkle5 kernel: [&amp;lt;ffffffff810b2640&amp;gt;] ? insert_kthread_work+0x40/0x40
Apr 12 18:57:25 arkle5 kernel: ---[ end trace aaf779f5b67c32db ]---
Apr 12 18:57:25 arkle7 kernel: ------------[ cut here ]------------
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I also have one client that looks permanently upset now (john50 is a client, arkle3 is an OSS)&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Apr 12 21:28:05 john50 kernel: LNetError: 909:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-192.168.44.33@o2ib44, match 1594448751341360 length 1048576 too big: 1045288 left, 1045288 allo
wed
Apr 12 21:28:05 arkle3 kernel: LustreError: 296380:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff880c21c90a00
Apr 12 21:28:05 arkle3 kernel: LustreError: 296380:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff880c21c90a00
Apr 12 21:28:05 arkle3 kernel: LustreError: 233673:0:(ldlm_lib.c:3242:target_bulk_io()) @@@ network error on bulk WRITE  req@ffff8817643ce450 x1594448751341360/t0(0) o4-&amp;gt;98af46e3-9fa3-6f5b-2dcd-f89325115978@192.168.44.150@o2ib44:626/0 lens 608/448 e 0 to 0 dl 1523532491 ref 1 fl Interpret:/0/0 rc 0/0
Apr 12 21:28:05 arkle3 kernel: Lustre: dagg-OST0004: Bulk IO write error with 98af46e3-9fa3-6f5b-2dcd-f89325115978 (at 192.168.44.150@o2ib44), client will retry: rc = -110
Apr 12 21:28:12 john50 kernel: Lustre: 933:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1523532485/real 1523532485]  req@ffff8817b5bd6900 x1594448751341360/t0(0) o4-&amp;gt;dagg-OST0004-osc-ffff88189aa1f000@192.168.44.33@o2ib44:6/4 lens 608/448 e 0 to 1 dl 1523532492 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Apr 12 21:28:12 john50 kernel: Lustre: 933:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
Apr 12 21:28:12 john50 kernel: Lustre: dagg-OST0004-osc-ffff88189aa1f000: Connection to dagg-OST0004 (at 192.168.44.33@o2ib44) was lost; in progress operations using this service will wait for recovery to complete
Apr 12 21:28:12 john50 kernel: Lustre: Skipped 3 previous similar messages
Apr 12 21:28:12 arkle3 kernel: Lustre: dagg-OST0004: Client 98af46e3-9fa3-6f5b-2dcd-f89325115978 (at 192.168.44.150@o2ib44) reconnecting
Apr 12 21:28:12 arkle3 kernel: Lustre: dagg-OST0004: Connection restored to 98af46e3-9fa3-6f5b-2dcd-f89325115978 (at 192.168.44.150@o2ib44)
Apr 12 21:28:12 john50 kernel: Lustre: dagg-OST0004-osc-ffff88189aa1f000: Connection restored to 192.168.44.33@o2ib44 (at 192.168.44.33@o2ib44)
Apr 12 21:28:12 john50 kernel: Lustre: Skipped 2 previous similar messages
Apr 12 21:28:12 john50 kernel: LNetError: 910:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-192.168.44.33@o2ib44, match 1594448751343216 length 1048576 too big: 1045288 left, 1045288 allowed
Apr 12 21:28:12 arkle3 kernel: LustreError: 296379:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff88169ed25a00
Apr 12 21:28:12 arkle3 kernel: LustreError: 296379:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff88169ed25a00
Apr 12 21:28:12 arkle3 kernel: LustreError: 50811:0:(ldlm_lib.c:3242:target_bulk_io()) @@@ network error on bulk WRITE  req@ffff8816dd9f2c50 x1594448751341360/t0(0) o4-&amp;gt;98af46e3-9fa3-6f5b-2dcd-f89325115978@192.168.44.150@o2ib44:633/0 lens 608/448 e 0 to 0 dl 1523532498 ref 1 fl Interpret:/2/0 rc 0/0
Apr 12 21:28:12 arkle3 kernel: Lustre: dagg-OST0004: Bulk IO write error with 98af46e3-9fa3-6f5b-2dcd-f89325115978 (at 192.168.44.150@o2ib44), client will retry: rc = -110
Apr 12 21:28:19 john50 kernel: Lustre: dagg-OST0004-osc-ffff88189aa1f000: Connection to dagg-OST0004 (at 192.168.44.33@o2ib44) was lost; in progress operations using this service will wait for recovery to complete
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;ll reboot the client as (IIRC) this has cleared up this kind of problem in the past&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Apr 13 01:22:00 arkle3 kernel: LustreError: 296378:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff880f6ea28800
Apr 13 01:22:07 arkle3 kernel: LustreError: 296377:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff88150b4d4800
Apr 13 01:22:07 arkle3 kernel: LustreError: 296377:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff88150b4d4800
Apr 13 01:22:15 arkle3 kernel: LustreError: 296377:0:(events.c:449:server_bulk_callback()) event type 5, status -61, desc ffff88150b4d2800
Apr 13 01:22:15 arkle3 kernel: LustreError: 296377:0:(events.c:449:server_bulk_callback()) event type 3, status -61, desc ffff88150b4d2800
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;debug +neterror (a default) is still enabled and the dk will still catch that. hopefully that will be enough for you.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;
</comment>
                            <comment id="226226" author="scadmin" created="Wed, 18 Apr 2018 08:53:13 +0000"  >&lt;p&gt;the script triggered on 3 sets of checksum errors last night.&lt;br/&gt;
dk&apos;s attached in 2018-04-18-03.LU10683.neterror.tgz&lt;br/&gt;
the names of the files tell you when they were captured.&lt;br/&gt;
also messages.checksum-i gives you an overall picture of when and where the dk&apos;s ran too.&lt;br/&gt;
arkles are servers. john&apos;s are clients.&lt;/p&gt;

&lt;p&gt;the 4 fids fingered are&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john5 ~]# lfs fid2path /dagg 0x20001251f:0x144:0x0
/dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0500/meraxes_grids_74.hdf5
[root@john5 ~]# lfs fid2path /dagg 0x280022737:0x5b:0x0
/dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0010/meraxes_grids_72.hdf5
[root@john5 ~]# lfs fid2path /dagg 0x680024e04:0x46:0x0
/dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0050/meraxes_grids_67.hdf5
[root@john5 ~]# lfs fid2path /dagg 0x680024e04:0x47:0x0
/dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0050/meraxes_grids_68.hdf5
[root@john5 ~]# ls -l /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0500/meraxes_grids_74.hdf5 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0010/meraxes_grids_72.hdf5 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0050/meraxes_grids_67.hdf5 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0050/meraxes_grids_68.hdf5
-rw-rw-r-- 1 yqin oz025 4832124728 Apr 18 03:31 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0010/meraxes_grids_72.hdf5
-rw-rw-r-- 1 yqin oz025 4832124728 Apr 18 03:22 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0050/meraxes_grids_67.hdf5
-rw-rw-r-- 1 yqin oz025 4832124728 Apr 18 03:24 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0050/meraxes_grids_68.hdf5
-rw-rw-r-- 1 yqin oz025 4832124728 Apr 18 03:35 /dagg/projects/oz025/$user/dragons/results/Tiamat/popiii_v2_tocf/PopIIIEfficiency_0pt0500/meraxes_grids_74.hdf5
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and that group is currently well under quota&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john5 ~]# lfs quota -h -g oz025 /dagg/
Disk quotas for grp oz025 (gid 10227):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
         /dagg/  6.342T      0k     10T       -  188885       0 1000000       -
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;ll check with them about the state of jobs that ran last night, and also the state of those files, and quota.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="226251" author="scadmin" created="Wed, 18 Apr 2018 15:35:07 +0000"  >&lt;p&gt;so far it looks like this group was NOT over quota, but was again using parallel hdf5 writes.&lt;/p&gt;

&lt;p&gt;the 4 files in question apparently look ok to them (ie. not obviously corrupted).&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="226346" author="hongchao.zhang" created="Thu, 19 Apr 2018 10:46:12 +0000"  >&lt;p&gt;As per the logs, the checksum is a little strange at client side&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[3111837064-3112361351]: client csum c253e960, server csum 5e6e4da8
00000020:02020000:15.0:1523986236.833762:0:10678:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3186813384-3187337671]: client csum c253e960, server csum a42a5590
00000020:02020000:1.0:1523986236.845204:0:285172:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3069890888-3070415175]: client csum c253e960, server csum fc43f2ee
00000020:02020000:17.0:1523986236.845219:0:298862:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3128614280-3129138567]: client csum c253e960, server csum 976d257c
00000020:02020000:5.0:1523986236.845231:0:287464:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3095056712-3095580999]: client csum c253e960, server csum 1747b303
00000020:02020000:7.0:1523986236.845411:0:287463:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3061502280-3062026567]: client csum c253e960, server csum 1c6d4fd9
00000020:02020000:9.0:1523986236.845578:0:134229:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3036333320-3036857607]: client csum c253e960, server csum 4b320be5
00000020:02020000:15.0:1523986236.845877:0:287086:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3027420424-3027944711]: client csum c253e960, server csum d537dae0
00000020:02020000:1.0:1523986236.846009:0:34879:0:(tgt_handler.c:2112:tgt_warn_on_cksum()) 168-f: dagg-OST0004: BAD WRITE CHECKSUM: from 12345-192.168.44.170@o2ib44 inode [0x280022737:0x5b:0x0] object 0x540000400:8286655 extent [3102921032-3103445319]: client csum c253e960, server csum 50418907
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the checksum of different extent for the same file (0x280022737:0x5b:0x0) is the same from the client side (all are &quot;c253e960&quot;),&lt;br/&gt;
but it is different at server side.&lt;/p&gt;

&lt;p&gt;I&apos;ll look at it more deeply to find out what cause it.&lt;/p&gt;</comment>
                            <comment id="227307" author="scadmin" created="Fri, 4 May 2018 09:26:29 +0000"  >&lt;p&gt;from reading the lustre manual it sounds like these checksum events are re-tried after they are detected, so the users might be seeing no effect from them.&lt;br/&gt;
is that correct?&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="227541" author="adilger" created="Tue, 8 May 2018 22:17:48 +0000"  >&lt;p&gt;Correct.  The client will resend on checksum failures up to 10 times by default (controlled by &lt;tt&gt;osc.&amp;#42;.resend_count&lt;/tt&gt;).&lt;/p&gt;

&lt;p&gt;That the checksum at the client is always the same implies that the data is also the same (e.g. all zero).  That it is different on the server each time implies it is being changed after the client has computed the checksum (e.g. in client RAM, over the network, or in OSS RAM).  If you are using &lt;tt&gt;mmap()&lt;/tt&gt; on files or &lt;tt&gt;O_DIRECT&lt;/tt&gt; with another thread modifying the pages it is possible to see such corruption, but Lustre can&apos;t do anything about it (short of copying the data, which is highly undesirable).&lt;/p&gt;</comment>
                            <comment id="229953" author="gerrit" created="Thu, 5 Jul 2018 11:44:56 +0000"  >&lt;p&gt;Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32788&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32788&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10683&quot; title=&quot;write checksum errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10683&quot;&gt;&lt;del&gt;LU-10683&lt;/del&gt;&lt;/a&gt; osd_zfs: set offset in page correctly&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1fb85e7eba6823b5822f7298c7fa648770239635&lt;/p&gt;</comment>
                            <comment id="230823" author="gerrit" created="Tue, 24 Jul 2018 15:59:33 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32788/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32788/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10683&quot; title=&quot;write checksum errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10683&quot;&gt;&lt;del&gt;LU-10683&lt;/del&gt;&lt;/a&gt; osd_zfs: set offset in page correctly&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 83cb17031913ba2f33a5b67219a03c5605f48f27&lt;/p&gt;</comment>
                            <comment id="231070" author="gerrit" created="Mon, 30 Jul 2018 15:47:15 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32899&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32899&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10683&quot; title=&quot;write checksum errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10683&quot;&gt;&lt;del&gt;LU-10683&lt;/del&gt;&lt;/a&gt; osd_zfs: set offset in page correctly&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1484662e0213007bfa5b7b68f02df77719a1a6d7&lt;/p&gt;</comment>
                            <comment id="231332" author="gerrit" created="Thu, 2 Aug 2018 19:25:37 +0000"  >&lt;p&gt;John L. Hammond (jhammond@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32899/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32899/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10683&quot; title=&quot;write checksum errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10683&quot;&gt;&lt;del&gt;LU-10683&lt;/del&gt;&lt;/a&gt; osd_zfs: set offset in page correctly&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 686a73ea9467c53d261cf12d0802bb1332d50f4a&lt;/p&gt;</comment>
                            <comment id="237540" author="gerrit" created="Tue, 27 Nov 2018 17:25:39 +0000"  >&lt;p&gt;Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33728&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33728&lt;/a&gt;&lt;br/&gt;
Subject: Revert &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10683&quot; title=&quot;write checksum errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10683&quot;&gt;&lt;del&gt;LU-10683&lt;/del&gt;&lt;/a&gt; osd_zfs: set offset in page correctly&quot;&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a95e7e398ee411a14c3d67072fbea273832e0957&lt;/p&gt;</comment>
                            <comment id="237605" author="gerrit" created="Wed, 28 Nov 2018 17:12:51 +0000"  >&lt;p&gt;Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33741&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33741&lt;/a&gt;&lt;br/&gt;
Subject: Revert &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10683&quot; title=&quot;write checksum errors&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10683&quot;&gt;&lt;del&gt;LU-10683&lt;/del&gt;&lt;/a&gt; osd_zfs: set offset in page correctly&quot;&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 3c3bbb07d12e327307fdea46aa48f22d50faf947&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="54018">LU-11663</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="52588">LU-11093</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="30025" name="2018-04-18-03.LU10683.neterror.tgz" size="49162456" author="scadmin" created="Wed, 18 Apr 2018 08:26:43 +0000"/>
                            <attachment id="29605" name="messages-20180220.txt" size="107511" author="rjh" created="Tue, 20 Feb 2018 04:36:09 +0000"/>
                            <attachment id="29843" name="messages_lustre-servers-and-clients-20180316.log" size="669982" author="rjh" created="Fri, 16 Mar 2018 04:16:09 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzszj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>