<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4290] osp_sync_threads encounters EIO on mount</title>
                <link>https://jira.whamcloud.com/browse/LU-4290</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We encountered this assertion in production, libcfs_panic_on_lbug was set to 1, so server rebooted. On mount, the same assertion and lbug would occur. Filesystem will mount with panic_on_lbug set to 0. We&apos;ve captured a crash dump and lustre log messages with the debug flags:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@atlas-mds3 ~&amp;#93;&lt;/span&gt;# cat /proc/sys/lnet/debug&lt;br/&gt;
trace ioctl neterror warning other error emerg ha config console&lt;/p&gt;

&lt;p&gt;Ran e2fsck:&lt;br/&gt;
e2fsck -f -j /dev/mapper/atlas2-mdt1-journal /dev/mapper/atlas2-mdt1 &lt;/p&gt;

&lt;p&gt;and only fixed the quota inconsistencies it found.&lt;/p&gt;

&lt;p&gt;At the moment, we are back to production after the osp_sync_threads lbugs on mount.  There are hung task messages about osp_sync_threads as would be expected. We want to fix the root issue that is causing the assertions. &lt;/p&gt;

&lt;p&gt;kernel messages during one of the failed mounts&lt;br/&gt;
Nov 21 21:16:44 atlas-mds3 kernel: [  911.319839] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts:&lt;br/&gt;
Nov 21 21:16:44 atlas-mds3 kernel: [  911.986208] Lustre: mdt_num_threads module parameter is deprecated, use mds_num_threads instead or unset both for dynamic thread startup&lt;br/&gt;
Nov 21 21:16:46 atlas-mds3 kernel: [  913.069371] Lustre: atlas2-MDT0000: used disk, loading&lt;br/&gt;
Nov 21 21:16:47 atlas-mds3 kernel: [  914.261572] LustreError: 18945:0:(osp_sync.c:862:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5&lt;br/&gt;
Nov 21 21:16:47 atlas-mds3 kernel: [  914.278318] LustreError: 18945:0:(osp_sync.c:862:osp_sync_thread()) LBUG&lt;br/&gt;
Nov 21 21:16:47 atlas-mds3 kernel: [  914.286036] Pid: 18945, comm: osp-syn-256&lt;br/&gt;
Nov 21 21:16:47 atlas-mds3 kernel: [  914.290841]&lt;br/&gt;
Nov 21 21:16:47 atlas-mds3 kernel: [  914.290844] Call Trace: &lt;/p&gt;

&lt;p&gt;We also see this message:&lt;br/&gt;
Nov 21 23:01:01 atlas-mds3 kernel: [ 1512.633528] ERST: NVRAM ERST Log Address Range is not implemented yet&lt;/p&gt;</description>
                <environment>RHEL 6.4/distro IB</environment>
        <key id="22204">LU-4290</key>
            <summary>osp_sync_threads encounters EIO on mount</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="blakecaldwell">Blake Caldwell</reporter>
                        <labels>
                    </labels>
                <created>Fri, 22 Nov 2013 04:48:25 +0000</created>
                <updated>Thu, 20 Mar 2014 14:28:11 +0000</updated>
                            <resolved>Thu, 20 Mar 2014 14:28:11 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="72095" author="pjones" created="Fri, 22 Nov 2013 05:40:28 +0000"  >&lt;p&gt;Alex&lt;/p&gt;

&lt;p&gt;Could you please comment on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="72096" author="bzzz" created="Fri, 22 Nov 2013 05:44:27 +0000"  >&lt;p&gt;Hi Blake, do you have lustre logs for the case?&lt;/p&gt;</comment>
                            <comment id="72098" author="blakecaldwell" created="Fri, 22 Nov 2013 05:58:52 +0000"  >&lt;p&gt;Similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3063&quot; title=&quot;osp_sync.c:866:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 29 changes, 26 in progress, 7 in flight: -5&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3063&quot;&gt;&lt;del&gt;LU-3063&lt;/del&gt;&lt;/a&gt; assert with EIO, but this was on MDT and there was no memory pressure.&lt;/p&gt;</comment>
                            <comment id="72101" author="bzzz" created="Fri, 22 Nov 2013 07:18:06 +0000"  >&lt;p&gt;strange.. it looks like the log file is short. the last claimed record is 64767:&lt;/p&gt;

&lt;p&gt;00000040:00001000:15.0:1385095225.257089:0:19973:0:(llog.c:344:llog_process_thread()) index: 3430 last_index 64767&lt;/p&gt;

&lt;p&gt;but we aren&apos;t able to find 3429 even:&lt;/p&gt;

&lt;p&gt;00000040:00001000:23.0:1385095225.346339:0:19973:0:(llog_osd.c:553:llog_osd_next_block()) looking for log index 3430 (cur idx 3430 off 227712)&lt;br/&gt;
00000040:00000001:23.0:1385095225.346342:0:19973:0:(llog_osd.c:656:llog_osd_next_block()) Process leaving via out (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)&lt;/p&gt;

&lt;p&gt;what kind of storage do you use? is it a software raid? have you encountered power loss issues or something similar?&lt;/p&gt;</comment>
                            <comment id="72122" author="blakecaldwell" created="Fri, 22 Nov 2013 12:58:43 +0000"  >&lt;p&gt;It&apos;s a SAS-attached NetApp 5524 with redundant controllers.  There were no power issues. However, there were problems on the hardware side.  At the time, some code was being tested that was read-only but triggered an error on the storage device. The messages below are representative of the ones seen. The EIO errors below should have been returned to the testing code, not a lustre process. However something related caused the EIO to osp_sync_threads to trip the LBUG around 20:20.&lt;/p&gt;

&lt;p&gt;Nov 21 20:01:18 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.021270&amp;#93;&lt;/span&gt; sd 6:0:13:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdp&amp;#93;&lt;/span&gt; Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK&lt;br/&gt;
Nov 21 20:01:18 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.030243&amp;#93;&lt;/span&gt; sd 6:0:13:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdp&amp;#93;&lt;/span&gt; CDB: Read(10): 28 00 dc 60 01 00 00 3c 00 00&lt;br/&gt;
Nov 21 20:01:18 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.040522&amp;#93;&lt;/span&gt; device-mapper: multipath: Failing path 8:240.&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.389700&amp;#93;&lt;/span&gt; sd 6:0:12:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdk&amp;#93;&lt;/span&gt; Unhandled error code&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.395458&amp;#93;&lt;/span&gt; sd 6:0:12:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdk&amp;#93;&lt;/span&gt; Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.404427&amp;#93;&lt;/span&gt; sd 6:0:12:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdk&amp;#93;&lt;/span&gt; CDB: Read(10): 28 00 dc 60 01 00 00 3c 00 00&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.412758&amp;#93;&lt;/span&gt; device-mapper: multipath: Failing path 8:160.&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.766397&amp;#93;&lt;/span&gt; sd 6:0:8:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdaj&amp;#93;&lt;/span&gt; Unhandled error code&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.772150&amp;#93;&lt;/span&gt; sd 6:0:8:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdaj&amp;#93;&lt;/span&gt; Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.781131&amp;#93;&lt;/span&gt; sd 6:0:8:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdaj&amp;#93;&lt;/span&gt; CDB: Read(10): 28 00 dc 60 01 00 00 3c 00 00&lt;br/&gt;
Nov 21 20:01:19 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408100.789487&amp;#93;&lt;/span&gt; device-mapper: multipath: Failing path 66:48.&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.801256&amp;#93;&lt;/span&gt; Buffer I/O error on device dm-2, logical block 404822017&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.808691&amp;#93;&lt;/span&gt; lost page write due to I/O error on dm-2&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.814580&amp;#93;&lt;/span&gt; end_request: I/O error, dev dm-2, sector 3239118328&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.975980&amp;#93;&lt;/span&gt; end_request: I/O error, dev dm-2, sector 3239442680&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.982901&amp;#93;&lt;/span&gt; end_request: I/O error, dev dm-2, sector 3239442944&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.989845&amp;#93;&lt;/span&gt; end_request: I/O error, dev dm-2, sector 3239447520&lt;br/&gt;
Nov 21 20:01:21 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408102.996781&amp;#93;&lt;/span&gt; end_request: I/O error, dev dm-2, sector 3239462088&lt;/p&gt;

&lt;p&gt;The system rebooted at ~20:20 and the last messages in syslog were:&lt;br/&gt;
Nov 21 20:13:48 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408848.596067&amp;#93;&lt;/span&gt; sd 6:0:8:0: &lt;span class=&quot;error&quot;&gt;&amp;#91;sdaj&amp;#93;&lt;/span&gt; CDB: Read(10): 28 00 dc 60 01 08 00 18 00 00&lt;br/&gt;
Nov 21 20:13:48 atlas-mds3 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;3408848.604397&amp;#93;&lt;/span&gt; device-mapper: multipath: Failing path 66:48.&lt;/p&gt;</comment>
                            <comment id="72124" author="bzzz" created="Fri, 22 Nov 2013 13:44:03 +0000"  >&lt;p&gt;I can&apos;t say for sure, but llog code detected missing records in the file, which might be a result of I/O issues.&lt;br/&gt;
the simplest solution to the issue is to remove that file at the cost of orphaned objects on OST (so, lost disk space) until lfsck run.&lt;/p&gt;</comment>
                            <comment id="72129" author="blakecaldwell" created="Fri, 22 Nov 2013 14:15:27 +0000"  >&lt;p&gt;Delete which file for OST llog?&lt;/p&gt;</comment>
                            <comment id="72136" author="blakecaldwell" created="Fri, 22 Nov 2013 15:22:20 +0000"  >&lt;p&gt;We will have an opportunity next Tuesday if more work is needed (truncate llog?). When running with osp_sync_threads tasks that are hung are we turning away client I/O for that OST?&lt;/p&gt;</comment>
                            <comment id="72141" author="bzzz" created="Fri, 22 Nov 2013 15:50:52 +0000"  >&lt;p&gt;the responsibility of osp_sync_thread is to destroy OST objects once the corresponding files are unlinked. technically it should be OK to run MDS with OSP thread stuck. but I&apos;d rather remove the file manually...&lt;/p&gt;

&lt;p&gt;00000040:00080000:15.0:1385095225.257044:0:19973:0:(llog_cat.c:558:llog_cat_process_cb()) processing log 0x18a3:1:0 at index 6 of catalog 0x204:1&lt;/p&gt;


&lt;p&gt;so, it should be /O/1/d3/6307. I&apos;d verify this with llog_reader utility and file&apos;s size should a bit more than 227712. it&apos;d be great if you can attach output of llog_reader here. when the file is confirmed you can remove it manually and start MDS.&lt;/p&gt;</comment>
                            <comment id="72151" author="dillowda" created="Fri, 22 Nov 2013 17:45:51 +0000"  >&lt;p&gt;We&apos;ll need to find four files IIRC, since we had four separate OST processes LBUG, correct?&lt;/p&gt;

&lt;p&gt;Blake, we should see how many of the llog_cat.c:558 messages we can find and correlate with the 4 threads that died; we may have to iterate a few times to get them all.&lt;/p&gt;</comment>
                            <comment id="72152" author="dillowda" created="Fri, 22 Nov 2013 17:46:51 +0000"  >&lt;p&gt;Alex, can you comment on the mapping from 0x18a3:1:0 to /O/1/d3/6307 so we can replicate it for the other logs?&lt;/p&gt;</comment>
                            <comment id="72153" author="dillowda" created="Fri, 22 Nov 2013 17:51:17 +0000"  >&lt;p&gt;Here&apos;s logs being processed by the the four threads that LBUG&apos;d:&lt;br/&gt;
00000040:00080000:15.0:1385095225.247128:0:19969:0:(llog_cat.c:558:llog_cat_process_cb()) processing log 0x18a1:1:0 at index 6 of catalog 0x200:1&lt;br/&gt;
00000040:00080000:15.0:1385095225.253880:0:19971:0:(llog_cat.c:558:llog_cat_process_cb()) processing log 0x18a2:1:0 at index 6 of catalog 0x202:1&lt;br/&gt;
00000040:00080000:15.0:1385095225.257044:0:19973:0:(llog_cat.c:558:llog_cat_process_cb()) processing log 0x18a3:1:0 at index 6 of catalog 0x204:1&lt;br/&gt;
00000040:00080000:15.0:1385095225.261700:0:19975:0:(llog_cat.c:558:llog_cat_process_cb()) processing log 0x18a4:1:0 at index 6 of catalog 0x206:1&lt;/p&gt;</comment>
                            <comment id="72154" author="bzzz" created="Fri, 22 Nov 2013 17:54:19 +0000"  >&lt;p&gt;0x18a3:1:0 - 1 is a sequence, so /O/1 - a hierarchy storying all the objects from sequence 1.&lt;/p&gt;

&lt;p&gt;(gdb) p 0x18a3&lt;br/&gt;
$1 = 6307&lt;br/&gt;
(gdb) p 0x18a3 &amp;amp; 31&lt;br/&gt;
$1 = 3&lt;/p&gt;

&lt;p&gt;/O/1/d3/6307&lt;/p&gt;

&lt;p&gt;I&apos;d suggest to rename the files to something like 6307.B to be able to recover in case, not just remove them.&lt;/p&gt;
</comment>
                            <comment id="72158" author="blakecaldwell" created="Fri, 22 Nov 2013 18:49:22 +0000"  >&lt;p&gt;Alex,&lt;/p&gt;

&lt;p&gt;We pulled the files below from debugfs and put them through llog_reader. The output of each (0x18a4, 0x18a3, 0x18a2, 0x18a1) is attached. We compared it to another log file that was processed successfully, which was empty.&lt;/p&gt;

&lt;p&gt;We will wait to hear back if these look good and will make a copy and remove.&lt;/p&gt;</comment>
                            <comment id="72165" author="blakecaldwell" created="Fri, 22 Nov 2013 20:13:09 +0000"  >&lt;p&gt;If someone could please confirm whether the attached llogs contain the expected information, we will proceed with renaming them. Until then we will operate in this non-optimal state. Thanks.&lt;/p&gt;</comment>
                            <comment id="72239" author="bzzz" created="Mon, 25 Nov 2013 16:39:59 +0000"  >&lt;p&gt;looks so. I suggest to rename them using direct ldiskfs mount and restart MDS.&lt;/p&gt;</comment>
                            <comment id="72340" author="blakecaldwell" created="Tue, 26 Nov 2013 21:00:11 +0000"  >&lt;p&gt;Renaming the files is ldiskfs resolved the mount issues. It reported a failure reading 2 llogs and the mount was successful. The other 2 were likely rate-limited by rsyslog.&lt;/p&gt;

&lt;p&gt;Nov 26 09:33:17 atlas-mds3 kernel: [  251.406572] Lustre: Lustre: Build Version: 2.4.1--CHANGED-2.6.32-358.18.1.el6.atlas.x86_64&lt;br/&gt;
Nov 26 09:33:27 atlas-mds3 kernel: [  261.291971] LDISKFS-fs (dm-3): recovery complete&lt;br/&gt;
Nov 26 09:33:27 atlas-mds3 kernel: [  261.343137] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: &lt;br/&gt;
Nov 26 09:33:28 atlas-mds3 kernel: [  262.967910] Lustre: atlas2-MDT0000: used disk, loading&lt;br/&gt;
Nov 26 09:33:29 atlas-mds3 kernel: [  263.695531] LustreError: 14567:0:(llog_cat.c:192:llog_cat_id2handle()) atlas2-OST00ff-osc-MDT0000: error opening log id 0x18a1:1:0: rc = -2&lt;br/&gt;
Nov 26 09:33:29 atlas-mds3 kernel: [  263.709646] LustreError: 14567:0:(llog_cat.c:795:cat_cancel_cb()) atlas2-OST00ff-osc-MDT0000: cannot find handle for llog 0x18a1:1: -2&lt;br/&gt;
Nov 26 09:33:29 atlas-mds3 kernel: [  263.723280] LustreError: 14567:0:(llog_cat.c:833:llog_cat_init_and_process()) atlas2-OST00ff-osc-MDT0000: llog_process() with cat_cancel_cb failed: rc = -2&lt;br/&gt;
Nov 26 09:33:29 atlas-mds3 kernel: [  263.746477] LustreError: 14567:0:(llog_cat.c:192:llog_cat_id2handle()) atlas2-OST0100-osc-MDT0000: error opening log id 0x18a2:1:0: rc = -2&lt;br/&gt;
Nov 26 09:33:29 atlas-mds3 kernel: [  263.760683] LustreError: 14567:0:(llog_cat.c:795:cat_cancel_cb()) atlas2-OST0100-osc-MDT0000: cannot find handle for llog 0x18a2:1: -2&lt;br/&gt;
Nov 26 09:33:29 atlas-mds3 kernel: [  263.774522] LustreError: 14567:0:(llog_cat.c:833:llog_cat_init_and_process()) atlas2-OST0100-osc-MDT0000: llog_process() with cat_cancel_cb failed: rc = -2&lt;br/&gt;
Nov 26 09:33:31 atlas-mds3 kernel: [  265.647379] LustreError: 11-0: atlas2-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.&lt;br/&gt;
Nov 26 09:33:31 atlas-mds3 kernel: [  265.655134] Lustre: atlas2-MDT0000: Imperative Recovery enabled, recovery window shrunk from 1800-5400 down to 900-2700&lt;/p&gt;</comment>
                            <comment id="72341" author="blakecaldwell" created="Tue, 26 Nov 2013 21:15:00 +0000"  >&lt;p&gt;When looking at the mtimes of other llogs we noticed that it corresponded to the time of the last successful mount. Note that 6305 was Nov  5 12:07. The output of llog_reader contains the same:&lt;br/&gt;
Time : Tue Nov  5 12:07:45 2013&lt;/p&gt;

&lt;p&gt;Now the question of why these logs got truncated during normal operation.  We identified that the MDT returned an error code not handled by the scsi layer.&lt;br/&gt;
Nov 26 10:58:27 atlas-mds3 kernel: [ 3820.152740] mpt2sas0: #011handle(0x000c), ioc_status(scsi data underrun)(0x0045), smid(1750)&lt;/p&gt;

&lt;p&gt;So if the other llogs were consumed and cleared on lustre mount (is that correct?), they don&apos;t appear to get appended/committed to in normal operation. Why would an EIO affect the llogs?&lt;/p&gt;

&lt;p&gt;root@atlas-mds3 d1]# ls -l&lt;br/&gt;
total 740&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root  19776 Nov 21 23:40 10017&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root  19584 Nov 21 23:40 10049&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root  19712 Nov 21 23:40 10081&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 229120 Nov  5 12:07 6305&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root   8320 Nov 21 22:42 8321&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root  24192 Nov 21 23:40 9345&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root  24192 Nov 21 23:40 9377&lt;/p&gt;</comment>
                            <comment id="72422" author="adilger" created="Wed, 27 Nov 2013 19:03:43 +0000"  >&lt;p&gt;Alex, it is never good to assert on IO errors from the disk. Should this be converted to an error and handled more gracefully?&lt;/p&gt;</comment>
                            <comment id="72461" author="bzzz" created="Thu, 28 Nov 2013 05:48:15 +0000"  >&lt;p&gt;yes, definitely. I&apos;m thinking on what would be a good reaction here. just skip such a log? remove it?&lt;/p&gt;</comment>
                            <comment id="72535" author="adilger" created="Fri, 29 Nov 2013 18:09:38 +0000"  >&lt;p&gt;Blake,&lt;br/&gt;
The timestamps on the logs are not updated by the Lustre code, so that is why it appears they are not modified after mount. Also, logs are only used once and then deleted, so new ones are crated each mount. &lt;/p&gt;

&lt;p&gt;Alex,&lt;br/&gt;
I think that if there is an error looking up a record in the llog that unlink should be skipped and the next record processed.  Once all the records are processed (for good or bad) the log file will be deleted anyway. I don&apos;t think this should be handled by the llog code internally, since we don&apos;t necessarily want to delete a config file if there us a bad block on disk or some other toor set problem. For the object unlink case, it would eventually be cleaned up by LFSCK so I don&apos;t think it is terrible if some records are not processed. &lt;/p&gt;</comment>
                            <comment id="76611" author="hilljjornl" created="Mon, 10 Feb 2014 16:49:58 +0000"  >&lt;p&gt;So the last discussion point was about a future looking fix - has any work occurred? Should we go ahead and create another LU for that effort and go ahead and close this issue out?&lt;/p&gt;

&lt;p&gt;&amp;#8211;&lt;br/&gt;
-Jason&lt;/p&gt;</comment>
                            <comment id="76821" author="bzzz" created="Wed, 12 Feb 2014 12:24:06 +0000"  >&lt;p&gt;Jason, I&apos;ve been working on an automated test for this and similar issues.&lt;/p&gt;</comment>
                            <comment id="77107" author="bzzz" created="Fri, 14 Feb 2014 19:12:18 +0000"  >&lt;p&gt;a preliminary patch: &lt;a href=&quot;http://review.whamcloud.com/#/c/9281/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9281/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="79228" author="jamesanunez" created="Thu, 13 Mar 2014 13:45:54 +0000"  >&lt;p&gt;The patch &lt;a href=&quot;http://review.whamcloud.com/#/c/9281/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9281/&lt;/a&gt; landed for 2.6. &lt;/p&gt;

&lt;p&gt;The patch that landed was &quot;preliminary&quot;. Does anything else need to be done to complete this ticket?&lt;/p&gt;</comment>
                            <comment id="79229" author="blakecaldwell" created="Thu, 13 Mar 2014 13:50:09 +0000"  >&lt;p&gt;Great! This ticket can be closed now.&lt;/p&gt;</comment>
                            <comment id="79852" author="pjones" created="Thu, 20 Mar 2014 14:28:11 +0000"  >&lt;p&gt;I checked with Alex and he agrees to close this ticket and track and further similar work under new tickets&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13860" name="6305.llog_out" size="74949" author="blakecaldwell" created="Fri, 22 Nov 2013 18:50:01 +0000"/>
                            <attachment id="13861" name="6306.llog_out" size="74377" author="blakecaldwell" created="Fri, 22 Nov 2013 18:50:01 +0000"/>
                            <attachment id="13862" name="6307.llog_out" size="74465" author="blakecaldwell" created="Fri, 22 Nov 2013 18:50:01 +0000"/>
                            <attachment id="13863" name="6308.llog_out" size="74311" author="blakecaldwell" created="Fri, 22 Nov 2013 18:50:01 +0000"/>
                            <attachment id="13851" name="lustre-log.1385095225.19969.gz" size="38538" author="blakecaldwell" created="Fri, 22 Nov 2013 05:47:59 +0000"/>
                            <attachment id="13852" name="lustre-log.1385095225.19971.gz" size="7444" author="blakecaldwell" created="Fri, 22 Nov 2013 05:47:59 +0000"/>
                            <attachment id="13853" name="lustre-log.1385095225.19973.gz" size="3022433" author="blakecaldwell" created="Fri, 22 Nov 2013 05:47:59 +0000"/>
                            <attachment id="13854" name="lustre-log.1385095225.19975.gz" size="4976" author="blakecaldwell" created="Fri, 22 Nov 2013 05:47:59 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw9w7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11773</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>