<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:01:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6612] (llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile</title>
                <link>https://jira.whamcloud.com/browse/LU-6612</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi &lt;/p&gt;

&lt;p&gt;We seeing that the MDT occupied lot of disk space and most of the space was occupied by OBJECTS/* files of the MDT. We took the backup of MDT and restored the Same MDT in the test lab and deleted all the files residing in the OBJECTS/* and removed the CATALOG file and remounted -t lustre and collected the logs. Below is a snippet showing the pattern of logs generated for each file removed. &lt;/p&gt;

 &lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; 00000040:00020000:0.0:1431530471.926250:0:17345:0:(llog_obd.c:320:cat_cancel_cb())
 Cannot find handle for log 0x2793959b: -116
 00000040:00080000:0.0:1431530471.926254:0:17345:0:(llog_obd.c:346:cat_cancel_cb())
 cancel log 0x2793959b:f6d1b675 at index 58627 of catalog 0x13600004
 00000040:00080000:0.0:1431530471.926255:0:17345:0:(llog_obd.c:315:cat_cancel_cb())
 processing log 0x2792f036:f6d21832 at index 58628 of catalog 0x13600004
 00000040:00020000:0.0:1431530471.926259:0:17345:0:(llog_lvfs.c:616:llog_lvfs_create())
 error looking up logfile 0x2792f036:0xf6d21832: rc -116
 00000040:00020000:0.0:1431530471.926261:0:17345:0:(llog_cat.c:174:llog_cat_id2handle())
 error opening log id 0x2792f036:f6d21832: rc -116
 00000040:00020000:0.0:1431530471.926262:0:17345:0:(llog_obd.c:320:cat_cancel_cb())
 Cannot find handle for log 0x2792f036: -116
 00000040:00080000:0.0:1431530471.926265:0:17345:0:(llog_obd.c:346:cat_cancel_cb())
 cancel log 0x2792f036:f6d21832 at index 58628 of catalog 0x13600004
 00000040:00080000:0.0:1431530471.926267:0:17345:0:(llog_obd.c:315:cat_cancel_cb())
 processing log 0x2793959d:f6d27b18 at index 58629 of catalog 0x13600004
 00000040:00020000:0.0:1431530471.926270:0:17345:0:(llog_lvfs.c:616:llog_lvfs_create())
 error looking up logfile 0x2793959d:0xf6d27b18: rc -116 ...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This seems the Lustre bug related to the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1749&quot; title=&quot;llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1749&quot;&gt;&lt;del&gt;LU-1749&lt;/del&gt;&lt;/a&gt;, which is reporting false error messages even the CATALOG file is deleted and if not then are we seeing any issues here related to those deleting files form the OBJECTS directory or is that going to affect the MDT data or going to lead to any data corruption. &lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</description>
                <environment>Lustre Server 2.1.6 </environment>
        <key id="30209">LU-6612</key>
            <summary>(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="manish">Manish Patel</reporter>
                        <labels>
                    </labels>
                <created>Mon, 18 May 2015 14:42:33 +0000</created>
                <updated>Tue, 26 Sep 2023 08:00:36 +0000</updated>
                            <resolved>Tue, 5 Sep 2017 19:33:50 +0000</resolved>
                                    <version>Lustre 2.1.6</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                    <fixVersion>Lustre 2.15.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="115801" author="bfaccini" created="Tue, 19 May 2015 08:44:25 +0000"  >&lt;p&gt;Since format of the following log confirms that you are running with patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1749&quot; title=&quot;llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1749&quot;&gt;&lt;del&gt;LU-1749&lt;/del&gt;&lt;/a&gt; in 2.1.6:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; 00000040:00020000:0.0:1431530471.926262:0:17345:0:(llog_obd.c:320:cat_cancel_cb())
 Cannot find handle for log 0x2792f036: -116
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I think you are safe.&lt;br/&gt;
Do you still see these messages ?&lt;/p&gt;</comment>
                            <comment id="115807" author="manish" created="Tue, 19 May 2015 13:47:07 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Yes I still see those messages. On the other hand do we still have to worry about the file system data or this error messages has nothing to do with the consistency of the data. Can you confirm about this.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="115905" author="bfaccini" created="Tue, 19 May 2015 17:32:39 +0000"  >&lt;p&gt;Still with the same log names/ids and index numbers ?&lt;/p&gt;

&lt;p&gt;Humm, in fact after re-reading your submission text, I wonder if your OSTs were stopped at the time you have removed the files from the OBJECTS directory and the CATALOG file on your MDT ?? Because if not, msgs may come from old/invalid LLOGs reference from OSTs. If this has been the case, you may find corresponding msgs on OSSs, can you check ?&lt;/p&gt;

&lt;p&gt;Last, are you using ChangeLogs ?&lt;/p&gt;</comment>
                            <comment id="115917" author="adilger" created="Tue, 19 May 2015 18:02:23 +0000"  >&lt;p&gt;Manish, the OBJECTS files are still in use in Lustre 2.1, so they can&apos;t just be deleted - they are used to track files being deleted on the MDT but not yet deleted on the OSTs.  It is only with Lustre 2.5 and later that the OBJECTS files are obsolete.&lt;/p&gt;

&lt;p&gt;I&apos;d agree with Bruno - it seems that these OBJECTS files are not being cleaned up because either the ChangeLog has been enabled but there is no consumer, or there is a missing or disabled OST that has objects deleted (or ownership changed) and the MDS is logging these changes until the OST is returned.&lt;/p&gt;

&lt;p&gt;Bruno, it is worthwhile to check whether disabling all ChangeLog consumers will delete the ChangeLog records, or if they are left behind?&lt;/p&gt;</comment>
                            <comment id="115936" author="manish" created="Tue, 19 May 2015 20:41:57 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;Changelog was enabled for a period of time until the changelog filled up which resulted in client mount issues with &quot;Lustre clients receive -14 &quot;Bad address&quot; from MDS &lt;span class=&quot;error&quot;&gt;&amp;#91;REF:1638415019758&amp;#93;&lt;/span&gt;&quot; and then changelog user was unregistered to fixt that .&lt;/p&gt;

&lt;p&gt;The current state is as follows:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dc2mds01 ~&amp;#93;&lt;/span&gt;# cat /proc/fs/lustre/mdd/dc2-MDT0000/changelog_users&lt;br/&gt;
current index: 10970114278&lt;br/&gt;
ID index&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@dc2mds01 ~&amp;#93;&lt;/span&gt;# cat /proc/fs/lustre/mdd/dc2-MDT0000/changelog_mask&lt;br/&gt;
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO OPEN CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="116134" author="bfaccini" created="Thu, 21 May 2015 19:40:59 +0000"  >&lt;p&gt;Manish,&lt;br/&gt;
My initial inclination is that the messages are harmless but I need to investigate more fully to confirm. I have several possible theories as to what kind of scenario could cause this situation. I will be in touch again when I have a clearer picture.&lt;br/&gt;
BTW, you did not answer about my previous question regarding the fact the OSTs were stopped or not at the time you removed OBJECTS/* and CATALOG files on MDT??&lt;/p&gt;</comment>
                            <comment id="116215" author="manish" created="Fri, 22 May 2015 16:41:14 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;The test which customer performed was on an isolated test server and network. I did not use any OSTs in my tests.&lt;/p&gt;

&lt;p&gt;Here is another question from customer.&lt;/p&gt;

&lt;p&gt;&quot;I wasn&apos;t aware that interacting with the MDT while mounted as ldiskfs could trigger Lustre operations with the OSTs. I suppose I need to do a bit more reading.&quot;&lt;/p&gt;


&lt;p&gt;Currently customer is planning for upcoming maintenance to resize the grown OI.16 file to free up some space.&lt;/p&gt;

&lt;p&gt;And if this resize process of oi.16 file created issues then can we simply restore the original oi.16 file? Or must we restore the entire dd based MDT backup?&lt;/p&gt;

&lt;p&gt;In either restoration case, how would we track/clean up any orphaned objects on the OSTs? Could we record enough information about every file we create during testing that we could use to surgically clean up? If so, how? Or, must we perform a full file system lfsck? Would we run into problems if we reverted to our backup and left a few hundred orphaned OST objects?&lt;/p&gt;

&lt;p&gt;1. So are we OK to delete those files from OBJECTS directory with Lustre v2.1.6. to clean extra occupied space by files residing in the OBJECTS directory on MDT or we upgrade the Lustre with v2.5.x or higher version and then perform the deletion of those files residing in the OBJECTS directory.&lt;br/&gt;
2. How would we track/clean up any orphaned objects on the OSTs with Lustre v2.1.6? Is an lfsck is an option here, if so then please provide the procedure to clean the orphan objects.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="116621" author="bfaccini" created="Wed, 27 May 2015 23:31:45 +0000"  >&lt;p&gt;Manish,&lt;br/&gt;
I am sorry but I am puzzled here, do you mean that you got the original msgs/errors for this ticket after restoring a MDT backup, removing OBJECTS/* and CATALOG files, and then mount it single (ie, without the original OSTs) ? And this just to test/see what will happen ? Did I understand the situation ? Because, if yes I don&apos;t think this &quot;test&quot; could be representative of the reality.&lt;/p&gt;

&lt;p&gt;And now, you speak about removing the OI.16.* files ?? But can you really explain what is the purpose of this test ? To prepare a 2.5 upgrade with a manual pre-cleaning of MDT, and a possible downgrade due to problems ? If this is the case, just do the upgrade and then run lfsck, it will do the cleanup for you and better than all of these manual and risky actions, you can trust me!&lt;/p&gt;

&lt;p&gt;To answer some of your numerous questions: if you remove all OI.16.* files they will be recreated upon MDT start/mount, removing of OBJECTS/* and CATALOG file should work even if there may be a few possible side effects/msgs (like orphaned objects on OSTs, ...), if you want to record all infos to be able to manually clean orphans on OSTs upon an MDT restore best choice is RobinHood, &quot;offline&quot; lfsck version available in 2.1 can also do it for you, &quot;online&quot; lfsck version available in 2.5 will do it much better and easier!&lt;/p&gt;</comment>
                            <comment id="117046" author="manish" created="Mon, 1 Jun 2015 15:13:55 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Yes you understood correctly and we were just trying to see if there is ant risk involved in the upcoming maintenance process, so we decided to postpone the removing the files from the OBJECTS directory and will work on that part with Lustre 2.5.x releases.&lt;/p&gt;

&lt;p&gt;Yes we hit with the bug DDN-77 and to fix that issue we have to go through this process and right now customer is still testing Lustre v2.5 and want&apos;s to keep on hold for now to get on v2.5 so till we have to go through this process of fixing the OI.16 file under Lustre v2.1.6.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;
</comment>
                            <comment id="118618" author="bfaccini" created="Tue, 16 Jun 2015 00:02:38 +0000"  >&lt;p&gt;Hello Manish, sorry to be late on this.&lt;br/&gt;
Not sure to fully understand your last comment, but according to my previous indications and DDN-77 content, I am definitely convinced that any OBJECTS/* and OI files removal should only occur post v2.5 upgrade and after a full filesystem stop/umount from Clients and Servers.&lt;/p&gt;</comment>
                            <comment id="118752" author="manish" created="Wed, 17 Jun 2015 04:24:13 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Well we have already did e.g. &quot;mv OBJECTS OBJECTS.DDN-191; mv CATALOGS CATALOGS.DDN-191;&quot; to that directory when we encountered some issues with hitting Directory index limit with DDN-191 ticket. Currently Customer is still on Lustre v2.1.6.&lt;/p&gt;

&lt;p&gt;Since we already renamed the original OBJECTS directory that contains 487G of llogs. Is it safe to delete the old OBJECTS directory to reclaim space during our next maintenance window?&lt;/p&gt;

&lt;p&gt;Why logs continue to accumulate in the OBJECTS directory and is there something we can do to stop it.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="119399" author="manish" created="Tue, 23 Jun 2015 17:24:36 +0000"  >&lt;p&gt;Hi Bruno, Andreas,&lt;/p&gt;

&lt;p&gt;Any updates/comments on my previous questions? Other maintenance schedule is heading closer and we have to plan accordingly for that maintenance schedule, so please suggest on this issue.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="119679" author="bfaccini" created="Fri, 26 Jun 2015 02:21:38 +0000"  >&lt;p&gt;Hello Manish, sorry to be late on this.&lt;br/&gt;
I definitely need more infos to determine what&apos;s going on at your site causing the new plain LLOGs to accumulate under OBJECTS directory :&lt;br/&gt;
_ &quot;lctl dl&quot; cmd output on MDS&lt;br/&gt;
_ &quot;ls -li&quot; of MDS&apos;s /OBJECTS directory output, when ldiskfs-mounted&lt;br/&gt;
_ &quot;llog_reader CATALOGS&quot; cmd output, when ldiskfs-mounted&lt;br/&gt;
_ binary copy of CATALOGS file&lt;/p&gt;

&lt;p&gt;And yes, as far as you have indicated that you are already running fine with moved OBJECT/CATALOGS directory/file, it should be fine to remove the old/renamed OBJECTS directory, but I would suggest you do this during dedicated time and all Clients/Servers unmounted.&lt;/p&gt;</comment>
                            <comment id="120081" author="manish" created="Wed, 1 Jul 2015 17:20:08 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Customer tried to run that command on their test cluster with backup mdt data and got error as below:&lt;/p&gt;

&lt;p&gt;dc-oss01# llog_reader CATALOGS&lt;br/&gt;
Memory Alloc for recs_buf error.&lt;br/&gt;
Could not pack buffer; rc=-12&lt;/p&gt;

&lt;p&gt;Do you know any alternative workaround to get those data output and their test cluster has 48G of RAM and production cluster has 128G.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="120383" author="bfaccini" created="Mon, 6 Jul 2015 12:26:44 +0000"  >&lt;p&gt;Hello Manish,&lt;br/&gt;
I don&apos;t think this error is related to some real memory lack, but more likely be the consequence of wrong parsing of CATALOGS file format in llog_reader where following code :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        /* the llog header not countable here.*/
        recs_num = le32_to_cpu((*llog)-&amp;gt;llh_count)-1;

        recs_buf = malloc(recs_num * sizeof(struct llog_rec_hdr *));
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;can easily return &amp;#45;ENOMEM if (*llog)-&amp;gt;llh_count value was mistakenly be read as a 0.&lt;/p&gt;

&lt;p&gt;Can you attach the &quot;binary copy of CATALOGS file&quot; I already/also have asked you ? And also the other infos I had requested too ??&lt;/p&gt;

&lt;p&gt;Also have you been able to remove the old/moved OBJECTS directory content/files ??&lt;/p&gt;</comment>
                            <comment id="120476" author="manish" created="Mon, 6 Jul 2015 19:56:09 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Can you be more clear about the &quot;binary copy of CATALOGS file&quot;. Is that you we just need to mount the MDT with ldiskfs and use normal cp command to copy that CATALOGS file and send it to you or there is some different process involved.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="120549" author="bfaccini" created="Tue, 7 Jul 2015 12:42:43 +0000"  >&lt;p&gt;Correct, just a raw copy of CATALOGS file.&lt;/p&gt;</comment>
                            <comment id="120572" author="manish" created="Tue, 7 Jul 2015 16:00:07 +0000"  >&lt;p&gt;Here is the CATALOGS file from MDT.&lt;/p&gt;</comment>
                            <comment id="120573" author="manish" created="Tue, 7 Jul 2015 16:00:34 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Attached is a copy of the current CATALOGS file.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/18394/CATALOGS.20150707&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/18394/CATALOGS.20150707&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The llog_reader command failed with the following error:&lt;/p&gt;

&lt;p&gt;Memory Alloc for recs_buf error.&lt;br/&gt;
Could not pack buffer; rc=-12&lt;/p&gt;

&lt;p&gt;So please advice how to get that output and what needs to be done to get rid of that &quot;recs_buf error&quot;&lt;/p&gt;

&lt;p&gt;We removed the OLD CATALOGS and OBJECTS directories, which we renamed about a month ago. Today customer site has scheduled for  maintenance so it would be nice to have next step befre we run out of this maintenance scheulde.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="120636" author="bfaccini" created="Tue, 7 Jul 2015 21:38:21 +0000"  >&lt;p&gt;Thanks for the CATALOGS binary file, but what about the :&lt;br/&gt;
_ &quot;lctl dl&quot; cmd output on MDS&lt;br/&gt;
_ &quot;ls -li&quot; of MDS&apos;s /OBJECTS directory output, when ldiskfs-mounted&lt;br/&gt;
other infos I already asked you to also provide ??&lt;br/&gt;
These infos are also required to understand what is going on.&lt;/p&gt;

&lt;p&gt;Concerning the &quot;recs_buf error&quot; from &quot;llog_reader CATALOGS&quot; cmd, they should come from the fact &quot;llog_reader&quot; don&apos;t know know howto interpret CATALOGS formet, but I can manage to get interesting stuff from CATALOGS binary file.&lt;/p&gt;</comment>
                            <comment id="121045" author="manish" created="Fri, 10 Jul 2015 20:17:33 +0000"  >&lt;p&gt;Bruno requested rest the data.&lt;/p&gt;</comment>
                            <comment id="121046" author="manish" created="Fri, 10 Jul 2015 20:18:25 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Here are the other requested files.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/18414/ddn_sr44330_lctl_dl.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/18414/ddn_sr44330_lctl_dl.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/18415/ddn_sr44330_ls_objects.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/18415/ddn_sr44330_ls_objects.txt&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/18416/llog_reader.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/18416/llog_reader.txt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="121683" author="bfaccini" created="Mon, 20 Jul 2015 16:00:45 +0000"  >&lt;p&gt;Hello Manish,&lt;br/&gt;
According to your latest provided files/infos, here is what I think :&lt;br/&gt;
          _ the llog_reader.txt file, shows the the oi.16 file as being huge, and as we already discussed before, how to cure this has been already described to you in DDN-77.&lt;/p&gt;

&lt;p&gt;          _ the llog_reader.txt file, also again highlights the llog_reader error when used on the CATALOG file that is not of expected LLOG format. I will push a patch soon to have llog_reader gracefully handle such case.&lt;/p&gt;

&lt;p&gt;          _ the ddn_sr44330_ls_objects.txt file shows that there are 336 &quot;100uvw:68b06xyz&quot; small files (2x per OST and half of them as the only referenced in CATALOG file), in OBJECTS/ dir. All are dated &quot;Jun  9 10:56&quot;. These should be the normal OSP sync LLOGs catalogs.&lt;/p&gt;

&lt;p&gt;          _ the ddn_sr44330_ls_objects.txt file also shows 190 more recent and others files of reasonable size (all are &amp;lt; 3MBs). Are these files te one you suspect to leak and to have consumed a huge space in your MDT before you have removed them ?&lt;/p&gt;

&lt;p&gt;          _ last, do you still see the &quot;(llog_obd.c:320:cat_cancel_cb())&lt;br/&gt;
 Cannot find handle for log 0x2793959b: -116&quot; original messages for this ticket ?&lt;/p&gt;</comment>
                            <comment id="121684" author="manish" created="Mon, 20 Jul 2015 16:06:15 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;We have already cured the hug oi.16 file issues and it&apos;s shrinked to normal size. Regarding rest of the logs files related questiosn I will have to check with customer to get their input and I will update this threads.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="121757" author="gerrit" created="Mon, 20 Jul 2015 22:14:58 +0000"  >&lt;p&gt;Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/15654&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15654&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; utils: strengthen llog_reader vs wrong format/header&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: bdc32b6164c08484b2f88b1a3a0bac7b9f223dea&lt;/p&gt;</comment>
                            <comment id="122025" author="manish" created="Thu, 23 Jul 2015 17:09:06 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;It seems that any file over a few hours old in the OBJECTS directory is leaked. It would be nice if you can take a look at the ls -lah output from the MDT OBJECTS directory in the attached file &quot;find_exec_ls_alh_manish.txt file&quot; to this case for a more complete history of the number of files, and their timestamp and size. The file lists most of the nearly 74k files that were removed during our last maintenance window.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/18474/find_exec_ls_alh_manish.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/18474/find_exec_ls_alh_manish.txt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most recent llog_obd.c:320:cat_cancel_cb() error logged was on 7/9 for a different log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jul 8 00:06:11 dc2mds01 kernel: : LustreError: 8949:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x27924bbc: -116
Jul 8 00:06:11 dc2mds01 kernel: : LustreError: 8949:0:(llog_obd.c:320:cat_cancel_cb()) Skipped 26419 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="122732" author="manish" created="Thu, 30 Jul 2015 17:27:34 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Customer is scheduled to have maintenance window on upcoming next Tuesday, so can you take a look at that file which I listed in my previous post and see if you need anything else from the customer to be collected during the maintenance downtime.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish     &lt;/p&gt;</comment>
                            <comment id="122841" author="bfaccini" created="Fri, 31 Jul 2015 12:57:08 +0000"  >&lt;p&gt;Hello Manish,&lt;br/&gt;
In the last find_exec_ls_alh_manish.txt you have provided, I don&apos;t see any files dated for the last 2 June/July monthes ...&lt;br/&gt;
Also, it looks much more different than the previous ddn_sr44330_ls_objects.txt content I have already described!&lt;br/&gt;
Can you detail how/when these 2x &quot;ls&quot; cmd reports have been done ?&lt;/p&gt;</comment>
                            <comment id="122864" author="manish" created="Fri, 31 Jul 2015 15:28:17 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Customer took the MDT backup and loaded the MDT data on their test cluster and mounted with ldiskfs and those data are captured on May-12-2015. The command used to get the file data &quot;find_exec_ls_alh_manish.txt &quot; is as below:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;find OBJECTS -name &quot;[0-9][0-9]*:[0-9a-f][0-9a-f]*&quot; -exec ls -lah {} \;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let me know if you need anything else to troubleshoot further.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="123073" author="bfaccini" created="Mon, 3 Aug 2015 19:30:40 +0000"  >&lt;p&gt;Humm I was also asking for some feedback about the differences I have found in previous ddn_sr44330_ls_objects.txt file/infos ...&lt;br/&gt;
BTW, if OBJECTS files still appear to leak on customer site, would be cool if you can attach some (old + recent) of them. And with a recent version of the CATALOGS file too.&lt;/p&gt;</comment>
                            <comment id="123254" author="manish" created="Tue, 4 Aug 2015 19:17:51 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Customer collected a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory. Here is new data which you requested.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let us know if you need anything else.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="124690" author="manish" created="Thu, 20 Aug 2015 15:02:48 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Any updates on the data which I uploaded in the previous post. &lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="125940" author="bfaccini" created="Tue, 1 Sep 2015 19:51:37 +0000"  >&lt;p&gt;Manish,&lt;br/&gt;
Sorry to be late on this, but unfortunately the &lt;a href=&quot;http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar&lt;/a&gt; returns me &quot;ERROR 404: Not Found.&quot; when I try to download it. Can you verify what is going on? Thanks in advance.&lt;/p&gt;</comment>
                            <comment id="125944" author="manish" created="Tue, 1 Sep 2015 20:50:24 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;I have uploaded that logs to Intel FTP site with directory name to this Jira Case &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt;. Let me know if you are able to get those logs from there.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="126141" author="bfaccini" created="Thu, 3 Sep 2015 09:11:20 +0000"  >&lt;p&gt;Adding recent email/answer sent to concerned people inclusing Manish :&lt;br/&gt;
=======================================================&lt;br/&gt;
I have started to work on the latest infos from the site that Manish has provided.&lt;br/&gt;
I am not fully done, because LLOGs analysis is quite a long process, but my 1st results seem to indicate that the files growing/leaking in OBJECTS directory are OSP Sync logs. If further analysis of all LLOGs confirms this, that means that either some of the osp_sync_thread&#8217;s are stuck or that some chown &amp;amp; unlink operations on MDT are never commited by some OSTs, or last, that some OSTs are stopped.&lt;br/&gt;
&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5297&quot; title=&quot;osp_sync_thread can&amp;#39;t handle invalid record gracefully&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5297&quot;&gt;&lt;del&gt;LU-5297&lt;/del&gt;&lt;/a&gt;, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6687&quot; title=&quot;ALL osp-sync in D state&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6687&quot;&gt;&lt;del&gt;LU-6687&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7079&quot; title=&quot;OSP shouldn&amp;#39;t discard requests due to imp_peer_committed_transno&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7079&quot;&gt;&lt;del&gt;LU-7079&lt;/del&gt;&lt;/a&gt; may be related.&lt;/p&gt;

&lt;p&gt;Since Manish is in copy, I am using this email to ask him more infos :&lt;br/&gt;
	_ does the site have 168 OSTs configured?&lt;br/&gt;
	_ do you again confirm that none of the OSTs are stopped?&lt;br/&gt;
	_ can you check osp_sync_thread&#8217;s state on MDS?&lt;br/&gt;
        _ since how many times now the concerned Lustre filesystem has been started? Would be nice to get the exact date.&lt;br/&gt;
	_ do you confirm site is running with 2.1.6?&lt;/p&gt;

&lt;p&gt;Thanks again for your help,&lt;br/&gt;
============================================================&lt;/p&gt;</comment>
                            <comment id="126142" author="bfaccini" created="Thu, 3 Sep 2015 09:19:20 +0000"  >&lt;p&gt;I have done more work on latest provided datas, and here are the results :&lt;br/&gt;
               _ CATALOGS file points to 168 Catalog files, presumably one for each OST/osp_sync log&lt;br/&gt;
               _ each of these catalogs only contain 5-7 valid+sparse records pointing to plain LLOG files.&lt;br/&gt;
               _ number of Catalog files (168) + number of all of their referenced LLOG files = number of files in OBJECTS dir&lt;br/&gt;
               _ I have not checked all of the LLOG referenced files, but all I had a look to only contain a few+sparsed records of MDS_UNLINK_REC type.&lt;/p&gt;

&lt;p&gt;So &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7079&quot; title=&quot;OSP shouldn&amp;#39;t discard requests due to imp_peer_committed_transno&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7079&quot;&gt;&lt;del&gt;LU-7079&lt;/del&gt;&lt;/a&gt; seems to look as the best choice/candidate, but I need to verify if it also applies to 2.1.6.&lt;/p&gt;</comment>
                            <comment id="126157" author="manish" created="Thu, 3 Sep 2015 13:55:39 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;-&amp;gt; Yes this site is having 168 OST&apos;s configured.&lt;br/&gt;
-&amp;gt; Yes none of the OST&apos;s are stopped.&lt;br/&gt;
-&amp;gt; yes site is running with 2.1.6&lt;/p&gt;

&lt;p&gt;I will have to get the rest of the data from the customer and post it once I get those details. So how do I check the &quot;osp_sync_thread&quot; state on MDS&quot;, is there any command which I can use it.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="126192" author="pjones" created="Thu, 3 Sep 2015 16:35:59 +0000"  >&lt;p&gt;Bruno&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7079&quot; title=&quot;OSP shouldn&amp;#39;t discard requests due to imp_peer_committed_transno&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7079&quot;&gt;&lt;del&gt;LU-7079&lt;/del&gt;&lt;/a&gt; was only introduced with Lustre 2.4.x so that cannot be the exact issue at play on this 2.1.6 deployment so I think that discounts the request to check osp_sync_thread&#8217;s state on the MDS. Are the other items still required? Anything else that DDN should gather to help support your ongoing investigation?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="126208" author="bfaccini" created="Thu, 3 Sep 2015 17:33:28 +0000"  >&lt;p&gt;Since it is established that our 2.1 problem could not be related/linked to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7079&quot; title=&quot;OSP shouldn&amp;#39;t discard requests due to imp_peer_committed_transno&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7079&quot;&gt;&lt;del&gt;LU-7079&lt;/del&gt;&lt;/a&gt;, and also that my previous request to check &quot;osp_sync_thread state on MDS&quot; is inacurate.&lt;br/&gt;
Would it be possible for you to provide MDS and OSSs syslogs/Console history  for the past 2/3 months time period ?&lt;br/&gt;
On the other hand I am parsing the 2.1.6 Lustre source code to find any code-path that may explain such leakage.&lt;/p&gt;
</comment>
                            <comment id="126309" author="manish" created="Thu, 3 Sep 2015 23:16:08 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;I have uploaded the logs to intel FTP under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; dir with file name as &quot;2015-09-03-SR44330_es_lustre_showall_2015-09-03_151236.tar.bz2&quot;&lt;/p&gt;

&lt;p&gt;File system has be restarted so many times and it&apos;s gone be hard to get all those data but here is recent activity.&lt;/p&gt;

&lt;p&gt;06-03-2015 : File System is taken down for oi.16 shrink procedure and restarted during scheduled maintenance window.&lt;br/&gt;
06-07-2015 - 06-09-2015 : The File System MDS experienced issues which required the MDT to be unmounted multiple times. During this unplanned outage is when Andreas instructed us to renamed the OBJECTS directory on the MDT.&lt;br/&gt;
07-07-2015 : File System was restarted during scheduled maintenance window&lt;br/&gt;
08-04-2015 : File System was taken offline during regular scheduled maintenance window to collect MDT OBJECTS and CATALOGS files information for Intel.&lt;br/&gt;
09-02-2015 : File System was restarted to accommodate a MDT backup.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="126400" author="bfaccini" created="Fri, 4 Sep 2015 16:51:27 +0000"  >&lt;p&gt;Thanks to provide the logs.&lt;br/&gt;
Looks like OSSs/OSTs quite frequently report the following errors/msgs sequence :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;messages:Sep  2 04:05:50 dc2oss12 kernel: : LustreError: 14888:0:(ldlm_resource.c:1090:ldlm_resource_get()) lvbo_init failed for resource 110829994: rc -2
messages:Sep  2 04:05:50 dc2oss12 kernel: : LustreError: 7212:0:(filter.c:3136:__filter_oa2dentry()) dc2-OST0075: filter_sync on non-existent object: 110829994:0 
messages:Sep  2 04:05:50 dc2oss12 kernel: : LustreError: 7212:0:(ost_handler.c:1648:ost_blocking_ast()) Error -2 syncing data on lock cancel
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I am presently analyzing the related code path to see if it can be the cause of the leaking MDS_UNLINK_REC records in the OSP-SYNC LLOGs.&lt;/p&gt;
</comment>
                            <comment id="126588" author="bfaccini" created="Mon, 7 Sep 2015 15:49:01 +0000"  >&lt;p&gt;There also the following error msgs from filter_destroy() which could be much more related :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ grep filter_destroy es_lustre_showall_2015-09-03_151236/*oss*/logs/messages-* 
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:23:13 dc2oss01 kernel: : [&amp;lt;ffffffffa0ca5a24&amp;gt;] filter_destroy+0xf24/0x13d0 [obdfilter]
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:24:56 dc2oss01 kernel: : LustreError: 15252:0:(filter.c:4267:filter_destroy()) error on commit, err = -30
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:24:56 dc2oss01 kernel: : LustreError: 15252:0:(filter.c:4267:filter_destroy()) Skipped 1 previous similar message
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:45 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:45 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:57 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:57 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy()) Skipped 7 previous similar messages
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:59 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:59 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy()) Skipped 9 previous similar messages
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:20:01 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss02/logs/messages-20150816:Aug 15 10:56:03 dc2oss02 kernel: : [&amp;lt;ffffffffa0c2f218&amp;gt;] ? filter_destroy+0x718/0x13d0 [obdfilter]
es_lustre_showall_2015-09-03_151236/dc2oss08/logs/messages-20150823:Aug 18 06:29:24 dc2oss08 kernel: : [&amp;lt;ffffffffa0cb2218&amp;gt;] filter_destroy+0x718/0x13d0 [obdfilter]
es_lustre_showall_2015-09-03_151236/dc2oss10/logs/messages-20150809:Aug  5 14:31:51 dc2oss10 kernel: : LustreError: 15496:0:(filter.c:1621:filter_destroy_internal()) destroying objid 94281010 ino 36943382 nlink 1 count 3
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and could be the start-point of MDS_UNLINK_REC records and containing LLOG plain file leakage.&lt;/p&gt;

&lt;p&gt;Since problem seems to occur on any OST, I have decided to concentrate on OST0000, and this OST&apos;s Catalog presently only contains 5 references/records of plain LLOGs :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Header size : 8192
Time : Tue Jun  9 07:48:06 2015
Number of records: 5
Target uuid :
-----------------------
#01 (064)ogen=68B06AFD name=0:1048793
#102 (064)ogen=8E05AC60 name=0:1049057
#103 (064)ogen=B52DE2C3 name=0:1049144
#104 (064)ogen=EEE3D5B name=0:1049420
#193 (064)ogen=B9CE5F6A name=0:1049449
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;if we use the 1st one, it contains (like the others) a few valid records :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rec #1 type=10612404 len=40   &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt; mainly MDS_UNLINK_REC records
rec #2 type=10612404 len=40
rec #3 type=10612404 len=40
rec #4 type=10612404 len=40
rec #5 type=10612404 len=40
rec #6 type=10612404 len=40
rec #7 type=10612404 len=40
rec #8 type=10612404 len=40
rec #9 type=10612404 len=40
rec #10 type=10612404 len=40
rec #11 type=10612404 len=40
rec #12 type=10612404 len=40
rec #13 type=10612404 len=40
rec #14 type=10612404 len=40
rec #15 type=10612404 len=40
rec #19 type=10612404 len=40
rec #20 type=10612404 len=40
rec #21 type=10612404 len=40
rec #24 type=10612404 len=40
rec #27 type=10612404 len=40
rec #28 type=10612404 len=40
rec #29 type=10612404 len=40
rec #30 type=10612404 len=40
rec #32 type=10612404 len=40
rec #33 type=10612404 len=40
rec #37 type=10612404 len=40
rec #38 type=10612404 len=40
rec #39 type=10612404 len=40
rec #42 type=10612404 len=40
rec #43 type=10612404 len=40
rec #44 type=10612404 len=40
rec #45 type=10612404 len=40
rec #46 type=10612404 len=40
rec #47 type=10612404 len=40
rec #52 type=10612404 len=40
rec #53 type=10612404 len=40
rec #54 type=10612404 len=40
rec #55 type=10612404 len=40
rec #56 type=10612404 len=40
rec #57 type=10612404 len=40
rec #58 type=10692401 len=56 &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt; but also a MDS_SETATTR64_REC
rec #59 type=10612404 len=40
rec #60 type=10612404 len=40
rec #61 type=10612404 len=40
rec #62 type=10612404 len=40
rec #63 type=10612404 len=40
rec #64 type=10612404 len=40
rec #65 type=10612404 len=40
rec #66 type=10612404 len=40
rec #67 type=10612404 len=40
rec #68 type=10612404 len=40
rec #69 type=10612404 len=40
rec #70 type=10612404 len=40
rec #71 type=10612404 len=40
rec #72 type=10612404 len=40
rec #73 type=10612404 len=40
rec #74 type=10612404 len=40
rec #75 type=10612404 len=40
rec #76 type=10612404 len=40
rec #77 type=10612404 len=40
rec #78 type=10612404 len=40
rec #79 type=10612404 len=40
rec #80 type=10612404 len=40
rec #81 type=10612404 len=40
rec #82 type=10612404 len=40
rec #83 type=10612404 len=40
rec #84 type=10612404 len=40
rec #85 type=10612404 len=40
rec #86 type=10612404 len=40
rec #87 type=10612404 len=40
rec #88 type=10612404 len=40
rec #89 type=10612404 len=40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thus, could you check if OBJID 0x532cb79 (87214969) still exists on OST0000 and also if OBJID 0x532ccc6 (87215302) still exists on OST0000, and the later with 0xb8517 (754967) uid and 0xd2 (210) gid ?&lt;br/&gt;
To do so, you can use debugfs on associated device and look for OBJIDs with the following commands :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;debugfs -R &quot;stat O/0/d25/87214969&quot; /dev/...
debugfs -R &quot;stat O/0/d6/87215302&quot; /dev/...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Would be also very helpful if you could again provide a similar package containing MDS&apos;s CATALOGS and OBJECTS/* files, for me to understand problem&apos;s evolution/consequences based on the recently provided logs content.&lt;/p&gt;
</comment>
                            <comment id="126645" author="manish" created="Tue, 8 Sep 2015 14:47:45 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Can you provide more details on what exactly you need for &lt;/p&gt;

&lt;p&gt;&quot;Would be also very helpful if you could again provide a similar package containing MDS&apos;s CATALOGS and OBJECTS/* files, for me to understand problem&apos;s evolution/consequences based on the recently provided logs content.&quot;&lt;/p&gt;

&lt;p&gt;Also do you need any out with 0xb8517 (754967) uid and 0xd2 (210) gid, then let us know the command to gather those data.&lt;/p&gt;

&lt;p&gt;I will ask the customer to get the debugfs output and update the post.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="126770" author="bfaccini" created="Wed, 9 Sep 2015 08:25:46 +0000"  >&lt;p&gt;&amp;gt; Can you provide more details on what exactly you need for&lt;br/&gt;
Exactly the same stuff/infos than what you have provided and commented in this ticket on 04/Aug/15, but from now.&lt;/p&gt;

&lt;p&gt;&amp;gt; Also do you need any out with 0xb8517 (754967) uid and 0xd2 (210) gid, then let us know the command to gather those data.&lt;br/&gt;
No that will be part of the infos returned for OBJID 0x532ccc6 (87215302).&lt;/p&gt;</comment>
                            <comment id="126811" author="manish" created="Wed, 9 Sep 2015 18:43:35 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Here is the debugfs output which you requested and rest of other data we can get during next maintenance window happening in October.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@dc2oss01 ~]# debugfs -R &quot;stat O/0/d25/87214969&quot; /dev/mapper/ost_dc2_0 2&amp;gt;&amp;amp;1 | tee /tmp/sr44330_87214969.log
debugfs 1.42.7.wc1 (12-Apr-2013)
O/0/d25/87214969: File not found by ext2_lookup
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@dc2oss01 ~]# debugfs -R &quot;stat O/0/d6/87215302&quot; /dev/mapper/ost_dc2_0 2&amp;gt;&amp;amp;1 | tee /tmp/sr44330_87215302.log
debugfs 1.42.7.wc1 (12-Apr-2013)
Inode: 94402427 Type: regular Mode: 0666 Flags: 0x80000
Generation: 1758127277 Version: 0x00000000:21734617
User: 754967 Group: 210 Size: 2914894
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 5696
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x55770077:00000000 -- Tue Jun 9 11:04:23 2015
atime: 0x55770086:00000000 -- Tue Jun 9 11:04:38 2015
mtime: 0x55770077:00000000 -- Tue Jun 9 11:04:23 2015
crtime: 0x5577003b:611b92d8 -- Tue Jun 9 11:03:23 2015 Size of extra inode fields: 28 Extended attributes stored in inode body:
fid = &quot;fe 0d 00 5b 76 f2 10 02 00 00 00 00 00 00 00 00 c6 cc 32 05 00 00 00 00
00 00 00 00 00 00 00 00 &quot; (32)
fid: objid=87215302 seq=0 parent=[0x210f2765b000dfe:0x0:0x0] stripe=0
EXTENTS:
(0-255):2009370624-2009370879, (256-711):2009379072-2009379527
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="126945" author="bfaccini" created="Thu, 10 Sep 2015 17:28:09 +0000"  >&lt;p&gt;Thanks Manish,&lt;br/&gt;
This is really interesting to see that for 2 Unlink and Setattr leaked records, respectivelly the 1st is no longer present (but may have been simply unlinked as expected!), and the 2nd is present and has its uid/gid set as expected with the values in its LLOG record.&lt;br/&gt;
Due to these last infos, I have worked more looking into the 2.1.6 source code and I think I have a good idea about how LLOG records may leak now. Based on this, I will now try to setup a reproducer along to push a debug patch.&lt;br/&gt;
More to come soon now.&lt;/p&gt;</comment>
                            <comment id="127056" author="gerrit" created="Fri, 11 Sep 2015 08:58:59 +0000"  >&lt;p&gt;Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16373&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16373&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; obdfilter: more debug upon filter_find_olg() failure&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_1&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 28b1448ac3f757c7df42cd15208cd36e4b20d4eb&lt;/p&gt;</comment>
                            <comment id="127111" author="manish" created="Fri, 11 Sep 2015 17:43:32 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;So is that patch is with the fix or it&apos;s just the debug patch.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="127171" author="pjones" created="Sat, 12 Sep 2015 14:46:07 +0000"  >&lt;p&gt;Manish&lt;/p&gt;

&lt;p&gt;Based on the comment above and the commit message it seems that this is a debug patch&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="127264" author="manish" created="Mon, 14 Sep 2015 18:23:53 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Were you able to create a reproducer and found anything?&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="127336" author="bfaccini" created="Tue, 15 Sep 2015 13:55:46 +0000"  >&lt;p&gt;Hello Manish,&lt;br/&gt;
Peter is right, my patch is only to give more infos on the code paths where I suspect we can leak LLOG records finally causing the orphan LLOG files accumulating under OBJECTS directory.&lt;br/&gt;
I am presently working to reproduce running with my patch, will let you know soon now about any progress/success.&lt;/p&gt;</comment>
                            <comment id="127777" author="bfaccini" created="Fri, 18 Sep 2015 14:42:24 +0000"  >&lt;p&gt;Unfortunately I am presently unable to reproduce even if trying to create scenarios likely to generate the previously suspected msgs ...&lt;br/&gt;
On the other hand working on the reproducer has permitted me to find other paths in code where there seems to be the possibility of records leakage in MDS&amp;lt;-&amp;gt;OSTs SYNC LLOGs. Recovery code may also be concerned because the MDS should be able to ask OSTs to replay such recorded operations upon restart ...&lt;br/&gt;
Due to this I am presently writing an enhanced version of my previous debug patch, if its further testing will be successful, do you think there is a chance that it could be installed on-site? MDS/OSSs are targeted.&lt;/p&gt;

&lt;p&gt;Also about the 2.1.6 version you have claimed for this problem, the logs yopu have provided state &quot;Lustre: Build Version: EXAScaler-ddn1.0--PRISTINE-2.6.32-358.11.1.el6_lustre.es359.devel.x86_64&quot;, can you give me more details about the exact content (Tag it is based on, additional patches, ...) ?&lt;/p&gt;

&lt;p&gt;And again, as I have already requested, in my 2 previous updates on 7+9/Sep/15, can you provide &#171; Exactly the same stuff/infos than what you have provided and commented in this ticket on 04/Aug/15, but from now &#187; ???&lt;/p&gt;</comment>
                            <comment id="127784" author="manish" created="Fri, 18 Sep 2015 15:17:02 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Yes if you can provide that new debug patch in few days then we can build new RPMS and test it and install it at next maintenance window, which is most likely to be happening on October/06 and at same time we can collect those previous set of data which was provided on 04/Aug/15.&lt;/p&gt;

&lt;p&gt;Here is more details about this Lustre Build Patches applied to that Build.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/18930/IU_Lustre_Build_Patches_06_08_2015.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/18930/IU_Lustre_Build_Patches_06_08_2015.txt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;




</comment>
                            <comment id="129304" author="manish" created="Mon, 5 Oct 2015 13:07:21 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Let us know if you need anything else other than below listed data, &lt;/p&gt;

&lt;p&gt;a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory.&lt;/p&gt;

&lt;p&gt;Customer is having a downtime tomorrow and if you have that enhanced version of patch ready then let us know.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish &lt;/p&gt;</comment>
                            <comment id="129428" author="bfaccini" created="Tue, 6 Oct 2015 07:38:02 +0000"  >&lt;p&gt;Hello Manish,&lt;br/&gt;
&amp;gt; a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory.&lt;br/&gt;
Yes, that will be very helpful, but even more if you can get the same infos after filesystem has been fully stopped, and again after filesystem has been fully re-started (and any recovery is completed).&lt;/p&gt;

&lt;p&gt;If you could also ensure to run the whole restart with the default (ioctl neterror warning error emerg ha config console lfsck) debug mask set, on MDS, and then take a full trace at the end of restart/recovery, that would also be very helpful.&lt;/p&gt;

&lt;p&gt;All MDS/OSSs syslogs/messages taken sometimes after restart will also be required.&lt;/p&gt;

&lt;p&gt;Last, you can add/use my debug patch available at &lt;a href=&quot;http://review.whamcloud.com/16373&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16373&lt;/a&gt; for Lustre version to be used for the restart, it will print more debug msgs/infos on some identified  paths in code that may cause the unlink/setattr LLOG records leakage.&lt;/p&gt;</comment>
                            <comment id="129436" author="bfaccini" created="Tue, 6 Oct 2015 13:08:40 +0000"  >&lt;p&gt;If customer agrees to set the MDS debug mask as I have requested (and also to grow the size of the log buffer), I need to better detail the procedure to apply it.&lt;br/&gt;
To be sure that you will catch and log the whole restart and recovery process on the MDS node, you will need to first load the libcfs module by running the &quot;modprobe libcfs&quot; command and change/verify the debug log mask (echo &quot;ioctl neterror warning error emerg ha config console lfsck&quot; &amp;gt;/proc/sys/lnet/debug) and buffer size (echo 8192 &amp;gt;/proc/sys/lnet/debug_mb, for 8GB but you can use less if MDS memory size is small), before mounting the MDT.&lt;br/&gt;
Then you will need to dump the debug log to a disk file using the &quot;lctl dk /tmp/lustre_debug.log&quot; command, right after the filesystem start has completed.&lt;/p&gt;

&lt;p&gt;Also, you can reset the mask and buffer log size, to their original values, after filesystem has been fully started.&lt;/p&gt;</comment>
                            <comment id="129447" author="manish" created="Tue, 6 Oct 2015 13:54:59 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;We need more clarification about this part. &lt;/p&gt;

&lt;p&gt;&quot;Yes, that will be very helpful, but even more if you can get the same infos after filesystem has been fully stopped, and again after filesystem has been fully re-started (and any recovery is completed).&quot;&lt;/p&gt;

&lt;p&gt;So do you mean that after full stopped we collect the data using ldiskfs and then get back file system online and once the recovery is completed we take down the file system again and mount it with ldiskfs and collect those same set of data? Or you have other way around to collect those data while file system is up and online using &apos;debugfs&apos; or other method. So can you please clarify about this part and provide detail steps to get those data.&lt;/p&gt;

&lt;p&gt;Also we won&apos;t be able to deploy the patch today since it has to go through their test lab testing and probably it will be deployed at next maintenance window.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                   Manish&lt;/p&gt;</comment>
                            <comment id="129457" author="bfaccini" created="Tue, 6 Oct 2015 14:22:40 +0000"  >&lt;p&gt;You will not need to stop the filesystem again, you can safely mount the MDT as ldiskfs, but from the same node/MDS only, and this in parallel of normal operations, until you don&apos;t modify anything on it. You can also use debugfs if you want, but dump&apos;ing all the OBJECTS/ dir files could be a long process.&lt;/p&gt;

&lt;p&gt;Ok, for the patch I will let you know if I modify/enhance it in between.&lt;/p&gt;</comment>
                            <comment id="129535" author="manish" created="Tue, 6 Oct 2015 19:07:26 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;Here are the logs which you requested.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ddntsr.com/ftp/2015-10-06-SR44330_20151006.tar&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://ddntsr.com/ftp/2015-10-06-SR44330_20151006.tar&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It contains the following files:&lt;/p&gt;

&lt;p&gt;SR44330_ls-li_MDT_OBJECTS_20151006.txt&lt;br/&gt;
SR44330_MDT_CATALOGS_201501006.txt&lt;br/&gt;
SR44330_MDT_OBJECTS_20151006.tar.gz&lt;br/&gt;
lustre_debug.20151006.log&lt;br/&gt;
SR44330_ls-li_MDT_OBJECTS_20151006b.txt&lt;br/&gt;
SR44330_MDT_CATALOGS_201501006b.txt&lt;br/&gt;
SR44330_MDT_OBJECTS_20151006b.tar.gz&lt;/p&gt;

&lt;p&gt;The files with a suffix of *b.txt and *b.tar.gz were collected after customer brought the file system back online.&lt;/p&gt;

&lt;p&gt;Here are the syslogs if needed.&lt;br/&gt;
&lt;a href=&quot;http://ddntsr.com/ftp/2015-10-06-SR44330_es_lustre_showall_2015-10-06_125006.tar.bz2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://ddntsr.com/ftp/2015-10-06-SR44330_es_lustre_showall_2015-10-06_125006.tar.bz2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let us know if you need anything else.&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="129816" author="bfaccini" created="Thu, 8 Oct 2015 12:17:57 +0000"  >&lt;p&gt;Thanks Manish, I am presently reviewing all of these new infos, will let you know asap about any progress.&lt;/p&gt;</comment>
                            <comment id="130987" author="bfaccini" created="Wed, 21 Oct 2015 09:41:51 +0000"  >&lt;p&gt;Sorry, it took me some time to parse all these new datas ...&lt;br/&gt;
The MDS debug log taken upon FS restart, when it is expected for the Unlink/Setattr LLOG records to be replayed between MDS and each OSTs, is particularly interesting because it shows that the concerned LLOG files of each OSTs are parsed. More debug-level are missing to understand why, but their records are not canceled/removed, thus LLOG files are kept.&lt;br/&gt;
Thus I have decided to again try to reproduce problem in-house using the following scenario :&lt;br/&gt;
        _ Client creates a file&lt;br/&gt;
        _ deactivate MDT/OST connexion&lt;br/&gt;
        _ Client remove the file&lt;br/&gt;
        _ stop MDT&lt;br/&gt;
        _ set full debug&lt;br/&gt;
        _ restart MDT&lt;br/&gt;
And this has permitted me to find that even if each MDS/OST Unlink/Setattr LLOG records are parsed/replayed, they are effectivelly not/never canceled :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00002000:00000001:27.0:1445357157.777095:0:5009:0:(filter_log.c:266:filter_recov_log_mds_ost_cb()) Process entered
00002000:00000001:27.0:1445357157.777097:0:5009:0:(filter_log.c:166:filter_recov_log_unlink_cb()) Process entered
00002000:00000010:27.0:1445357157.777099:0:5009:0:(filter_log.c:169:filter_recov_log_unlink_cb()) slab-alloced &apos;((oa))&apos;: 208 at ffff8808362f4cb0.
00002000:00000001:27.0:1445357157.777103:0:5009:0:(filter.c:4119:filter_destroy()) Process entered
00002000:00000001:27.0:1445357157.777105:0:5009:0:(filter_capa.c:127:filter_auth_capa()) Process entered
00002000:00000001:27.0:1445357157.777107:0:5009:0:(filter_capa.c:141:filter_auth_capa()) Process leaving (rc=0 : 0 : 0)
00002000:00000002:27.0:1445357157.777110:0:5009:0:(filter.c:4133:filter_destroy()) lustre-OST0000: filter_destroy(group=0,oid=2)
00002000:00000001:27.0:1445357157.777113:0:5009:0:(filter.c:1474:filter_fid2dentry()) Process entered
00002000:00000002:27.0:1445357157.777116:0:5009:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d2/2
00002000:00000002:27.0:1445357157.777177:0:5009:0:(filter.c:1518:filter_fid2dentry()) got child objid 2: ffff880834151740, count = 1
00002000:00000001:27.0:1445357157.777180:0:5009:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446612167547754304 : -131906161797312 : ffff880834151740)
00002000:00000002:27.0:1445357157.777183:0:5009:0:(filter.c:4142:filter_destroy()) destroying non-existent object 2:0
00002000:00000001:27.0:1445357157.777185:0:5009:0:(filter.c:4161:filter_destroy()) Process leaving via cleanup (rc=18446744073709551614 : -2 : 0xfffffffffffffffe)
00002000:00000002:27.0:1445357157.777187:0:5009:0:(filter.c:221:f_dput()) putting 2: ffff880834151740, count = 0
00002000:00000001:27.0:1445357157.777191:0:5009:0:(lustre_quota.h:653:lquota_adjust()) Process entered
00040000:00000001:27.0:1445357157.777193:0:5009:0:(quota_master.c:577:filter_quota_adjust()) Process entered
00040000:00000001:27.0:1445357157.777195:0:5009:0:(quota_master.c:580:filter_quota_adjust()) Process leaving (rc=0 : 0 : 0)
00002000:00000001:27.0:1445357157.777197:0:5009:0:(lustre_quota.h:657:lquota_adjust()) Process leaving (rc=0 : 0 : 0)
00002000:00000010:27.0:1445357157.777198:0:5009:0:(filter_log.c:201:filter_recov_log_unlink_cb()) slab-freed &apos;((oa))&apos;: 208 at ffff8808362f4cb0.
00002000:00000001:27.0:1445357157.777200:0:5009:0:(filter_log.c:203:filter_recov_log_unlink_cb()) Process leaving (rc=0 : 0 : 0)
00002000:00000001:27.0:1445357157.777202:0:5009:0:(filter_log.c:313:filter_recov_log_mds_ost_cb()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and according to the code in b2_1, this should occur by calling either llog_cancel() or filter_cancel_cookies_cb() directly or as a callback, but only if OBD_MD_FLCOOKIE is not set in obdo valid flags, and this stands for both the unlink and setattr records case :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
int filter_setattr_internal(struct obd_export *exp, struct dentry *dentry,
                            struct obdo *oa, struct obd_trans_info *oti)
{
        unsigned int orig_ids[MAXQUOTAS] = {0, 0};
        struct llog_cookie *fcc = NULL;
        struct filter_obd *filter;
        int rc, err, sync = 0;

....................

        if (oa-&amp;gt;o_valid &amp;amp; OBD_MD_FLCOOKIE) {
                OBD_ALLOC(fcc, sizeof(*fcc));
                if (fcc != NULL)
                        *fcc = oa-&amp;gt;o_lcookie;
        }

...................

        if (oa-&amp;gt;o_valid &amp;amp; OBD_MD_FLFLAGS) {
                rc = fsfilt_iocontrol(exp-&amp;gt;exp_obd, dentry,
                                      FSFILT_IOC_SETFLAGS, (long)&amp;amp;oa-&amp;gt;o_flags);
        } else {
                rc = fsfilt_setattr(exp-&amp;gt;exp_obd, dentry, handle, &amp;amp;iattr, 1);
                if (fcc != NULL)
                        /* set cancel cookie callback function */
                        sync = fsfilt_add_journal_cb(exp-&amp;gt;exp_obd, 0, handle,
                                                     filter_cancel_cookies_cb,    &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
                                                     fcc);
        }

...................

        if (sync) {
                filter_cancel_cookies_cb(exp-&amp;gt;exp_obd, 0, fcc, rc);
                fcc = NULL;
        }

..........................

int filter_destroy(struct obd_export *exp, struct obdo *oa,
                   struct lov_stripe_md *md, struct obd_trans_info *oti,
                   struct obd_export *md_exp, void *capa)
{
        unsigned int qcids[MAXQUOTAS] = {0, 0};
        struct obd_device *obd;
        struct filter_obd *filter;
        struct dentry *dchild = NULL, *dparent = NULL;
        struct lustre_handle lockh = { 0 };
        struct lvfs_run_ctxt saved;
        void *handle = NULL;
        struct llog_cookie *fcc = NULL;
        int rc, rc2, cleanup_phase = 0, sync = 0;
        struct iattr iattr;
        unsigned long now;
        ENTRY;

        rc = filter_auth_capa(exp, NULL, oa-&amp;gt;o_seq,
                              (struct lustre_capa *)capa, CAPA_OPC_OSS_DESTROY);
        if (rc)
                RETURN(rc);

        obd = exp-&amp;gt;exp_obd;
        filter = &amp;amp;obd-&amp;gt;u.filter;

        push_ctxt(&amp;amp;saved, &amp;amp;obd-&amp;gt;obd_lvfs_ctxt, NULL);
        cleanup_phase = 1;

        CDEBUG(D_INODE, &quot;%s: filter_destroy(group=&quot;LPU64&quot;,oid=&quot;
               LPU64&quot;)\n&quot;, obd-&amp;gt;obd_name, oa-&amp;gt;o_seq, oa-&amp;gt;o_id);

        dchild = filter_fid2dentry(obd, NULL, oa-&amp;gt;o_seq, oa-&amp;gt;o_id);
        if (IS_ERR(dchild))
                GOTO(cleanup, rc = PTR_ERR(dchild));
        cleanup_phase = 2;

        if (dchild-&amp;gt;d_inode == NULL) {
                CDEBUG(D_INODE, &quot;destroying non-existent object &quot;POSTID&quot;\n&quot;,
                       oa-&amp;gt;o_id, oa-&amp;gt;o_seq);
                /* If object already gone, cancel cookie right now */
                if (oa-&amp;gt;o_valid &amp;amp; OBD_MD_FLCOOKIE) {
                        struct llog_ctxt *ctxt;
                        struct obd_llog_group *olg;

                        olg = filter_find_olg(obd, oa-&amp;gt;o_seq);
                        if (!olg) {
                                CERROR(&quot;%s: can not find olg of group %d &quot;
                                       &quot;for objid &quot;LPU64&quot;\n&quot;, obd-&amp;gt;obd_name,
                                      (int)oa-&amp;gt;o_seq, oa-&amp;gt;o_id);
                               GOTO(cleanup, rc = PTR_ERR(olg));
                        }
                        fcc = &amp;amp;oa-&amp;gt;o_lcookie;
                        ctxt = llog_group_get_ctxt(olg, fcc-&amp;gt;lgc_subsys + 1);
                        llog_cancel(ctxt, NULL, 1, fcc, 0);        &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
                        llog_ctxt_put(ctxt);
                        fcc = NULL; /* we didn&apos;t allocate fcc, don&apos;t free it */
                }
                GOTO(cleanup, rc = -ENOENT);
        }

        rc = filter_prepare_destroy(obd, oa-&amp;gt;o_id, oa-&amp;gt;o_seq, &amp;amp;lockh);
        if (rc)
                GOTO(cleanup, rc);

        /* Our MDC connection is established by the MDS to us */
        if (oa-&amp;gt;o_valid &amp;amp; OBD_MD_FLCOOKIE) {
                OBD_ALLOC(fcc, sizeof(*fcc));
                if (fcc != NULL)
                        *fcc = oa-&amp;gt;o_lcookie;   &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
        }

        /* we&apos;re gonna truncate it first in order to avoid possible deadlock:
         *      P1                      P2
         * open trasaction      open transaction
         * down(i_zombie)       down(i_zombie)
         *                      restart transaction
         * (see BUG 4180) -bzzz
         *
         * take i_alloc_sem too to prevent other threads from writing to the
         * file while we are truncating it. This can cause lock ordering issue
         * between page lock, i_mutex &amp;amp; starting new journal handle.
         * (see bug 20321) -johann
         */
        now = jiffies;
        down_write(&amp;amp;dchild-&amp;gt;d_inode-&amp;gt;i_alloc_sem);
        LOCK_INODE_MUTEX(dchild-&amp;gt;d_inode);
        fsfilt_check_slow(exp-&amp;gt;exp_obd, now, &quot;i_alloc_sem and i_mutex&quot;);

        /* VBR: version recovery check */
        rc = filter_version_get_check(exp, oti, dchild-&amp;gt;d_inode);
        if (rc) {
                UNLOCK_INODE_MUTEX(dchild-&amp;gt;d_inode);
                up_write(&amp;amp;dchild-&amp;gt;d_inode-&amp;gt;i_alloc_sem);
                GOTO(cleanup, rc);
        }

        handle = fsfilt_start_log(obd, dchild-&amp;gt;d_inode, FSFILT_OP_SETATTR,
                                  NULL, 1);
        if (IS_ERR(handle)) {
                UNLOCK_INODE_MUTEX(dchild-&amp;gt;d_inode);
                up_write(&amp;amp;dchild-&amp;gt;d_inode-&amp;gt;i_alloc_sem);
                GOTO(cleanup, rc = PTR_ERR(handle));
        }

        /* Locking order: i_mutex -&amp;gt; journal_lock -&amp;gt; dqptr_sem. LU-952 */
        ll_vfs_dq_init(dchild-&amp;gt;d_inode);

        iattr.ia_valid = ATTR_SIZE;
        iattr.ia_size = 0;
        rc = fsfilt_setattr(obd, dchild, handle, &amp;amp;iattr, 1);
        rc2 = fsfilt_commit(obd, dchild-&amp;gt;d_inode, handle, 0);
        UNLOCK_INODE_MUTEX(dchild-&amp;gt;d_inode);
        up_write(&amp;amp;dchild-&amp;gt;d_inode-&amp;gt;i_alloc_sem);
        if (rc)
                GOTO(cleanup, rc);
        if (rc2)
                GOTO(cleanup, rc = rc2);

        /* We don&apos;t actually need to lock the parent until we are unlinking
         * here, and not while truncating above.  That avoids holding the
         * parent lock for a long time during truncate, which can block other
         * threads from doing anything to objects in that directory. bug 7171 */
        dparent = filter_parent_lock(obd, oa-&amp;gt;o_seq, oa-&amp;gt;o_id);
        if (IS_ERR(dparent))
                GOTO(cleanup, rc = PTR_ERR(dparent));
        cleanup_phase = 3; /* filter_parent_unlock */

        LOCK_INODE_MUTEX(dchild-&amp;gt;d_inode);
        handle = fsfilt_start_log(obd, dparent-&amp;gt;d_inode,FSFILT_OP_UNLINK,oti,1);
        if (IS_ERR(handle)) {
                UNLOCK_INODE_MUTEX(dchild-&amp;gt;d_inode);
                GOTO(cleanup, rc = PTR_ERR(handle));
        }
        cleanup_phase = 4; /* fsfilt_commit */

        /* Quota release need uid/gid of inode */
        obdo_from_inode(oa, dchild-&amp;gt;d_inode, NULL, OBD_MD_FLUID|OBD_MD_FLGID);

        filter_fmd_drop(exp, oa-&amp;gt;o_id, oa-&amp;gt;o_seq);

        /* this drops dchild-&amp;gt;d_inode-&amp;gt;i_mutex unconditionally */
        rc = filter_destroy_internal(obd, oa-&amp;gt;o_id, oa-&amp;gt;o_seq, dparent, dchild);

        EXIT;
cleanup:
        switch(cleanup_phase) {
        case 4:
                if (fcc != NULL)
                        sync = fsfilt_add_journal_cb(obd, 0, oti ?
                                                     oti-&amp;gt;oti_handle : handle,
                                                     filter_cancel_cookies_cb, &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
                                                     fcc);
                /* If add_journal_cb failed, then filter_finish_transno
                 * will commit the handle and we will do a sync
                 * on commit. then we call callback directly to free
                 * the fcc.
                 */
                rc = filter_finish_transno(exp, NULL, oti, rc, sync);
                if (sync) {
                        filter_cancel_cookies_cb(obd, 0, fcc, rc); &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
                        fcc = NULL;
                }
                rc2 = fsfilt_commit(obd, dparent-&amp;gt;d_inode, handle, 0);
                if (rc2) {
                        CERROR(&quot;error on commit, err = %d\n&quot;, rc2);
                        if (!rc)
                                rc = rc2;
                } else {
                        fcc = NULL;
                }
        case 3:
                filter_parent_unlock(dparent);
        case 2:
                filter_fini_destroy(obd, &amp;amp;lockh);

                f_dput(dchild);
                if (fcc != NULL)
                        OBD_FREE(fcc, sizeof(*fcc));
        case 1:
                pop_ctxt(&amp;amp;saved, &amp;amp;obd-&amp;gt;obd_lvfs_ctxt, NULL);
                break;
        default:
                CERROR(&quot;invalid cleanup_phase %d\n&quot;, cleanup_phase);
                LBUG();
        }

        /* trigger quota release */
        qcids[USRQUOTA] = oa-&amp;gt;o_uid;
        qcids[GRPQUOTA] = oa-&amp;gt;o_gid;
        rc2 = lquota_adjust(filter_quota_interface_ref, obd, qcids, NULL, rc,
                            FSFILT_OP_UNLINK);
        if (rc2)
                CERROR(&quot;filter adjust qunit! (rc:%d)\n&quot;, rc2);
        return rc;
}

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;then, having a look to the code of callers :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;/* Callback for processing the unlink log record received from MDS by
 * llog_client_api. */
static int filter_recov_log_unlink_cb(struct llog_ctxt *ctxt,
                                      struct llog_rec_hdr *rec,
                                      struct llog_cookie *cookie)
{
        struct obd_export *exp = ctxt-&amp;gt;loc_obd-&amp;gt;obd_self_export;
        struct llog_unlink_rec *lur;
        struct obdo *oa;
        obd_count count;
        int rc = 0;
        ENTRY;

        lur = (struct llog_unlink_rec *)rec;
        OBDO_ALLOC(oa);
        if (oa == NULL)
                RETURN(-ENOMEM);
        oa-&amp;gt;o_valid |= OBD_MD_FLCOOKIE;  &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
        oa-&amp;gt;o_id = lur-&amp;gt;lur_oid;
        oa-&amp;gt;o_seq = lur-&amp;gt;lur_oseq;
        oa-&amp;gt;o_valid = OBD_MD_FLID | OBD_MD_FLGROUP;  &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt; bad !!!
        oa-&amp;gt;o_lcookie = *cookie;
        /* objid gap may require to destroy several objects in row */
        count = lur-&amp;gt;lur_count + 1;

        /* This check is only valid before FID-on-OST and it should
         * be removed after FID-on-OST is implemented */
        if (oa-&amp;gt;o_seq &amp;gt; FID_SEQ_OST_MAX) {
                CERROR(&quot;%s: invalid group number &quot;LPU64&quot; &amp;gt; MAX_CMD_GROUP %u\n&quot;,
                        exp-&amp;gt;exp_obd-&amp;gt;obd_name, oa-&amp;gt;o_seq, FID_SEQ_OST_MAX);
                RETURN(-EINVAL);
        }

        while (count &amp;gt; 0) {
                rc = filter_destroy(exp, oa, NULL, NULL, NULL, NULL);
                if (rc == 0)
                        CDEBUG(D_RPCTRACE, &quot;object &quot;LPU64&quot; is destroyed\n&quot;,
                               oa-&amp;gt;o_id);
                else if (rc != -ENOENT)
                        CEMERG(&quot;error destroying object &quot;LPU64&quot;: %d\n&quot;,
                               oa-&amp;gt;o_id, rc);
                else
                        rc = 0;
                count--;
                oa-&amp;gt;o_id++;
        }
        OBDO_FREE(oa);

        RETURN(rc);
}

/* Callback for processing the setattr log record received from MDS by
 * llog_client_api. */
static int filter_recov_log_setattr_cb(struct llog_ctxt *ctxt,
                                       struct llog_rec_hdr *rec,
                                       struct llog_cookie *cookie)
{
        struct obd_device *obd = ctxt-&amp;gt;loc_obd;
        struct obd_export *exp = obd-&amp;gt;obd_self_export;
        struct obd_info oinfo = { { { 0 } } };
        obd_id oid;
        int rc = 0;
        ENTRY;

        OBDO_ALLOC(oinfo.oi_oa);
        if (oinfo.oi_oa == NULL)
                RETURN(-ENOMEM);

        if (rec-&amp;gt;lrh_type == MDS_SETATTR_REC) {
                struct llog_setattr_rec *lsr = (struct llog_setattr_rec *)rec;

                oinfo.oi_oa-&amp;gt;o_id = lsr-&amp;gt;lsr_oid;
                oinfo.oi_oa-&amp;gt;o_seq = lsr-&amp;gt;lsr_oseq;
                oinfo.oi_oa-&amp;gt;o_uid = lsr-&amp;gt;lsr_uid;
                oinfo.oi_oa-&amp;gt;o_gid = lsr-&amp;gt;lsr_gid;
        } else {
                struct llog_setattr64_rec *lsr = (struct llog_setattr64_rec *)rec;

                oinfo.oi_oa-&amp;gt;o_id = lsr-&amp;gt;lsr_oid;
                oinfo.oi_oa-&amp;gt;o_seq = lsr-&amp;gt;lsr_oseq;
                oinfo.oi_oa-&amp;gt;o_uid = lsr-&amp;gt;lsr_uid;
                oinfo.oi_oa-&amp;gt;o_gid = lsr-&amp;gt;lsr_gid;
        }

        oinfo.oi_oa-&amp;gt;o_valid |= (OBD_MD_FLID | OBD_MD_FLUID | OBD_MD_FLGID |
                                 OBD_MD_FLCOOKIE); &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;
        oinfo.oi_oa-&amp;gt;o_valid = OBD_MD_FLID | OBD_MD_FLGROUP; &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt; bad !!
        oinfo.oi_oa-&amp;gt;o_lcookie = *cookie;
        oid = oinfo.oi_oa-&amp;gt;o_id;

        rc = filter_setattr(exp, &amp;amp;oinfo, NULL);
        OBDO_FREE(oinfo.oi_oa);

        if (rc == -ENOENT) {
                CDEBUG(D_RPCTRACE, &quot;object already removed, send cookie\n&quot;);
                llog_cancel(ctxt, NULL, 1, cookie, 0);
                RETURN(0);
        }

        if (rc == 0)
                CDEBUG(D_RPCTRACE, &quot;object &quot;LPU64&quot; is chown/chgrp\n&quot;, oid);

        RETURN(rc);
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Will push a b2_1 patch to fix this now, and see how it works.&lt;/p&gt;</comment>
                            <comment id="131003" author="gerrit" created="Wed, 21 Oct 2015 13:45:39 +0000"  >&lt;p&gt;Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16906&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16906&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; obdfilter: do not overwrite OBD_MD_FLCOOKIE flag&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_1&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1e04568d15e0c623faa8256d0583552a79331a9f&lt;/p&gt;</comment>
                            <comment id="132320" author="bfaccini" created="Mon, 2 Nov 2015 11:12:15 +0000"  >&lt;p&gt;Debug patch at &lt;a href=&quot;http://review.whamcloud.com/16373&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16373&lt;/a&gt; has been abandonned.&lt;/p&gt;</comment>
                            <comment id="132623" author="manish" created="Wed, 4 Nov 2015 14:54:58 +0000"  >&lt;p&gt;Hi Bruno,&lt;/p&gt;

&lt;p&gt;I see that this patch  &lt;a href=&quot;http://review.whamcloud.com/16373&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16373&lt;/a&gt; is abandoned and there is one more patch which is in progress &lt;a href=&quot;http://review.whamcloud.com/#/c/16906/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/16906/&lt;/a&gt; so do you want us to apply any of those patches and need something from our end. &lt;/p&gt;

&lt;p&gt;Also were you able to reproduce this issues in your test lab and if so then did your Fix patch worked to fix those issues?&lt;/p&gt;

&lt;p&gt;Thank You,&lt;br/&gt;
                  Manish&lt;/p&gt;</comment>
                            <comment id="132802" author="bfaccini" created="Thu, 5 Nov 2015 23:56:07 +0000"  >&lt;p&gt;Hello Manish,&lt;br/&gt;
#16373 debug patch has been abandoned because bug has been found and real fix has been pushed as #16906.&lt;br/&gt;
Yes, I have been able to reproduce issue/leak in lab, and no I have presently missed time to verify fix. But even if the bug was quite difficult to identify, the fix is definitely obvious.&lt;br/&gt;
So it would be nice if you could give a try to patch #16906.&lt;/p&gt;
</comment>
                            <comment id="207459" author="orentas" created="Tue, 5 Sep 2017 19:22:53 +0000"  >&lt;p&gt;This is quite old and no longer an active issue.  Please close.&lt;/p&gt;</comment>
                            <comment id="207462" author="pjones" created="Tue, 5 Sep 2017 19:33:51 +0000"  >&lt;p&gt;ok - thanks Oz&lt;/p&gt;</comment>
                            <comment id="341533" author="gerrit" created="Tue, 26 Jul 2022 04:53:22 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/15654/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/15654/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; utils: strengthen llog_reader vs wrong format/header&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 45291b8c06eebf33d3654db3a7d3cfc5836004a6&lt;/p&gt;</comment>
                            <comment id="344350" author="gerrit" created="Tue, 23 Aug 2022 09:29:17 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/48309&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48309&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; utils: strengthen llog_reader vs wrong format/header&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7e3f7d1792eba8f3001a618de48397f05046e5f6&lt;/p&gt;</comment>
                            <comment id="349934" author="gerrit" created="Mon, 17 Oct 2022 23:27:48 +0000"  >&lt;p&gt;&quot;Jian Yu &amp;lt;yujian@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48900&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48900&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; utils: strengthen llog_reader vs wrong format/header&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 6e98a6e0e0c6bce20c5f3cb8fae1b0ffd9532efb&lt;/p&gt;</comment>
                            <comment id="381021" author="gerrit" created="Wed, 2 Aug 2023 06:17:52 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/48900/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/48900/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6612&quot; title=&quot;(llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6612&quot;&gt;&lt;del&gt;LU-6612&lt;/del&gt;&lt;/a&gt; utils: strengthen llog_reader vs wrong format/header&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: badba63a54e905129dbdf28e31026580453ea337&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10322">
                    <name>Gantt End to Start</name>
                                            <outwardlinks description="has to be done before">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="18394" name="CATALOGS.20150707" size="5376" author="manish" created="Tue, 7 Jul 2015 16:00:07 +0000"/>
                            <attachment id="18930" name="IU_Lustre_Build_Patches_06_08_2015.txt" size="1182" author="manish" created="Fri, 18 Sep 2015 15:14:59 +0000"/>
                            <attachment id="18414" name="ddn_sr44330_lctl_dl.txt" size="10697" author="manish" created="Fri, 10 Jul 2015 20:17:33 +0000"/>
                            <attachment id="18415" name="ddn_sr44330_ls_objects.txt" size="36308" author="manish" created="Fri, 10 Jul 2015 20:17:33 +0000"/>
                            <attachment id="18474" name="find_exec_ls_alh_manish.txt" size="5025771" author="manish" created="Thu, 23 Jul 2015 17:02:57 +0000"/>
                            <attachment id="18416" name="llog_reader.txt" size="981" author="manish" created="Fri, 10 Jul 2015 20:17:33 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 1 Dec 2015 14:42:33 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxdjj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 18 May 2015 14:42:33 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>