[LU-6612] (llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile Created: 18/May/15  Updated: 26/Sep/23  Resolved: 05/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: Lustre 2.16.0, Lustre 2.15.4

Type: Bug Priority: Minor
Reporter: Manish Patel (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Lustre Server 2.1.6


Attachments: File CATALOGS.20150707     Text File IU_Lustre_Build_Patches_06_08_2015.txt     Text File ddn_sr44330_lctl_dl.txt     Text File ddn_sr44330_ls_objects.txt     Text File find_exec_ls_alh_manish.txt     Text File llog_reader.txt    
Issue Links:
Gantt End to Start
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi

We seeing that the MDT occupied lot of disk space and most of the space was occupied by OBJECTS/* files of the MDT. We took the backup of MDT and restored the Same MDT in the test lab and deleted all the files residing in the OBJECTS/* and removed the CATALOG file and remounted -t lustre and collected the logs. Below is a snippet showing the pattern of logs generated for each file removed.

 00000040:00020000:0.0:1431530471.926250:0:17345:0:(llog_obd.c:320:cat_cancel_cb())
 Cannot find handle for log 0x2793959b: -116
 00000040:00080000:0.0:1431530471.926254:0:17345:0:(llog_obd.c:346:cat_cancel_cb())
 cancel log 0x2793959b:f6d1b675 at index 58627 of catalog 0x13600004
 00000040:00080000:0.0:1431530471.926255:0:17345:0:(llog_obd.c:315:cat_cancel_cb())
 processing log 0x2792f036:f6d21832 at index 58628 of catalog 0x13600004
 00000040:00020000:0.0:1431530471.926259:0:17345:0:(llog_lvfs.c:616:llog_lvfs_create())
 error looking up logfile 0x2792f036:0xf6d21832: rc -116
 00000040:00020000:0.0:1431530471.926261:0:17345:0:(llog_cat.c:174:llog_cat_id2handle())
 error opening log id 0x2792f036:f6d21832: rc -116
 00000040:00020000:0.0:1431530471.926262:0:17345:0:(llog_obd.c:320:cat_cancel_cb())
 Cannot find handle for log 0x2792f036: -116
 00000040:00080000:0.0:1431530471.926265:0:17345:0:(llog_obd.c:346:cat_cancel_cb())
 cancel log 0x2792f036:f6d21832 at index 58628 of catalog 0x13600004
 00000040:00080000:0.0:1431530471.926267:0:17345:0:(llog_obd.c:315:cat_cancel_cb())
 processing log 0x2793959d:f6d27b18 at index 58629 of catalog 0x13600004
 00000040:00020000:0.0:1431530471.926270:0:17345:0:(llog_lvfs.c:616:llog_lvfs_create())
 error looking up logfile 0x2793959d:0xf6d27b18: rc -116 ...

This seems the Lustre bug related to the LU-1749, which is reporting false error messages even the CATALOG file is deleted and if not then are we seeing any issues here related to those deleting files form the OBJECTS directory or is that going to affect the MDT data or going to lead to any data corruption.

Thank You,
Manish



 Comments   
Comment by Bruno Faccini (Inactive) [ 19/May/15 ]

Since format of the following log confirms that you are running with patch for LU-1749 in 2.1.6:

 00000040:00020000:0.0:1431530471.926262:0:17345:0:(llog_obd.c:320:cat_cancel_cb())
 Cannot find handle for log 0x2792f036: -116

I think you are safe.
Do you still see these messages ?

Comment by Manish Patel (Inactive) [ 19/May/15 ]

Hi Bruno,

Yes I still see those messages. On the other hand do we still have to worry about the file system data or this error messages has nothing to do with the consistency of the data. Can you confirm about this.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 19/May/15 ]

Still with the same log names/ids and index numbers ?

Humm, in fact after re-reading your submission text, I wonder if your OSTs were stopped at the time you have removed the files from the OBJECTS directory and the CATALOG file on your MDT ?? Because if not, msgs may come from old/invalid LLOGs reference from OSTs. If this has been the case, you may find corresponding msgs on OSSs, can you check ?

Last, are you using ChangeLogs ?

Comment by Andreas Dilger [ 19/May/15 ]

Manish, the OBJECTS files are still in use in Lustre 2.1, so they can't just be deleted - they are used to track files being deleted on the MDT but not yet deleted on the OSTs. It is only with Lustre 2.5 and later that the OBJECTS files are obsolete.

I'd agree with Bruno - it seems that these OBJECTS files are not being cleaned up because either the ChangeLog has been enabled but there is no consumer, or there is a missing or disabled OST that has objects deleted (or ownership changed) and the MDS is logging these changes until the OST is returned.

Bruno, it is worthwhile to check whether disabling all ChangeLog consumers will delete the ChangeLog records, or if they are left behind?

Comment by Manish Patel (Inactive) [ 19/May/15 ]

Hi Andreas,

Changelog was enabled for a period of time until the changelog filled up which resulted in client mount issues with "Lustre clients receive -14 "Bad address" from MDS [REF:1638415019758]" and then changelog user was unregistered to fixt that .

The current state is as follows:

[root@dc2mds01 ~]# cat /proc/fs/lustre/mdd/dc2-MDT0000/changelog_users
current index: 10970114278
ID index

[root@dc2mds01 ~]# cat /proc/fs/lustre/mdd/dc2-MDT0000/changelog_mask
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO OPEN CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 21/May/15 ]

Manish,
My initial inclination is that the messages are harmless but I need to investigate more fully to confirm. I have several possible theories as to what kind of scenario could cause this situation. I will be in touch again when I have a clearer picture.
BTW, you did not answer about my previous question regarding the fact the OSTs were stopped or not at the time you removed OBJECTS/* and CATALOG files on MDT??

Comment by Manish Patel (Inactive) [ 22/May/15 ]

Hi Bruno,

The test which customer performed was on an isolated test server and network. I did not use any OSTs in my tests.

Here is another question from customer.

"I wasn't aware that interacting with the MDT while mounted as ldiskfs could trigger Lustre operations with the OSTs. I suppose I need to do a bit more reading."

Currently customer is planning for upcoming maintenance to resize the grown OI.16 file to free up some space.

And if this resize process of oi.16 file created issues then can we simply restore the original oi.16 file? Or must we restore the entire dd based MDT backup?

In either restoration case, how would we track/clean up any orphaned objects on the OSTs? Could we record enough information about every file we create during testing that we could use to surgically clean up? If so, how? Or, must we perform a full file system lfsck? Would we run into problems if we reverted to our backup and left a few hundred orphaned OST objects?

1. So are we OK to delete those files from OBJECTS directory with Lustre v2.1.6. to clean extra occupied space by files residing in the OBJECTS directory on MDT or we upgrade the Lustre with v2.5.x or higher version and then perform the deletion of those files residing in the OBJECTS directory.
2. How would we track/clean up any orphaned objects on the OSTs with Lustre v2.1.6? Is an lfsck is an option here, if so then please provide the procedure to clean the orphan objects.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 27/May/15 ]

Manish,
I am sorry but I am puzzled here, do you mean that you got the original msgs/errors for this ticket after restoring a MDT backup, removing OBJECTS/* and CATALOG files, and then mount it single (ie, without the original OSTs) ? And this just to test/see what will happen ? Did I understand the situation ? Because, if yes I don't think this "test" could be representative of the reality.

And now, you speak about removing the OI.16.* files ?? But can you really explain what is the purpose of this test ? To prepare a 2.5 upgrade with a manual pre-cleaning of MDT, and a possible downgrade due to problems ? If this is the case, just do the upgrade and then run lfsck, it will do the cleanup for you and better than all of these manual and risky actions, you can trust me!

To answer some of your numerous questions: if you remove all OI.16.* files they will be recreated upon MDT start/mount, removing of OBJECTS/* and CATALOG file should work even if there may be a few possible side effects/msgs (like orphaned objects on OSTs, ...), if you want to record all infos to be able to manually clean orphans on OSTs upon an MDT restore best choice is RobinHood, "offline" lfsck version available in 2.1 can also do it for you, "online" lfsck version available in 2.5 will do it much better and easier!

Comment by Manish Patel (Inactive) [ 01/Jun/15 ]

Hi Bruno,

Yes you understood correctly and we were just trying to see if there is ant risk involved in the upcoming maintenance process, so we decided to postpone the removing the files from the OBJECTS directory and will work on that part with Lustre 2.5.x releases.

Yes we hit with the bug DDN-77 and to fix that issue we have to go through this process and right now customer is still testing Lustre v2.5 and want's to keep on hold for now to get on v2.5 so till we have to go through this process of fixing the OI.16 file under Lustre v2.1.6.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 16/Jun/15 ]

Hello Manish, sorry to be late on this.
Not sure to fully understand your last comment, but according to my previous indications and DDN-77 content, I am definitely convinced that any OBJECTS/* and OI files removal should only occur post v2.5 upgrade and after a full filesystem stop/umount from Clients and Servers.

Comment by Manish Patel (Inactive) [ 17/Jun/15 ]

Hi Bruno,

Well we have already did e.g. "mv OBJECTS OBJECTS.DDN-191; mv CATALOGS CATALOGS.DDN-191;" to that directory when we encountered some issues with hitting Directory index limit with DDN-191 ticket. Currently Customer is still on Lustre v2.1.6.

Since we already renamed the original OBJECTS directory that contains 487G of llogs. Is it safe to delete the old OBJECTS directory to reclaim space during our next maintenance window?

Why logs continue to accumulate in the OBJECTS directory and is there something we can do to stop it.

Thank You,
Manish

Comment by Manish Patel (Inactive) [ 23/Jun/15 ]

Hi Bruno, Andreas,

Any updates/comments on my previous questions? Other maintenance schedule is heading closer and we have to plan accordingly for that maintenance schedule, so please suggest on this issue.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 26/Jun/15 ]

Hello Manish, sorry to be late on this.
I definitely need more infos to determine what's going on at your site causing the new plain LLOGs to accumulate under OBJECTS directory :
_ "lctl dl" cmd output on MDS
_ "ls -li" of MDS's /OBJECTS directory output, when ldiskfs-mounted
_ "llog_reader CATALOGS" cmd output, when ldiskfs-mounted
_ binary copy of CATALOGS file

And yes, as far as you have indicated that you are already running fine with moved OBJECT/CATALOGS directory/file, it should be fine to remove the old/renamed OBJECTS directory, but I would suggest you do this during dedicated time and all Clients/Servers unmounted.

Comment by Manish Patel (Inactive) [ 01/Jul/15 ]

Hi Bruno,

Customer tried to run that command on their test cluster with backup mdt data and got error as below:

dc-oss01# llog_reader CATALOGS
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

Do you know any alternative workaround to get those data output and their test cluster has 48G of RAM and production cluster has 128G.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 06/Jul/15 ]

Hello Manish,
I don't think this error is related to some real memory lack, but more likely be the consequence of wrong parsing of CATALOGS file format in llog_reader where following code :

        /* the llog header not countable here.*/
        recs_num = le32_to_cpu((*llog)->llh_count)-1;

        recs_buf = malloc(recs_num * sizeof(struct llog_rec_hdr *));

can easily return -ENOMEM if (*llog)->llh_count value was mistakenly be read as a 0.

Can you attach the "binary copy of CATALOGS file" I already/also have asked you ? And also the other infos I had requested too ??

Also have you been able to remove the old/moved OBJECTS directory content/files ??

Comment by Manish Patel (Inactive) [ 06/Jul/15 ]

Hi Bruno,

Can you be more clear about the "binary copy of CATALOGS file". Is that you we just need to mount the MDT with ldiskfs and use normal cp command to copy that CATALOGS file and send it to you or there is some different process involved.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 07/Jul/15 ]

Correct, just a raw copy of CATALOGS file.

Comment by Manish Patel (Inactive) [ 07/Jul/15 ]

Here is the CATALOGS file from MDT.

Comment by Manish Patel (Inactive) [ 07/Jul/15 ]

Hi Bruno,

Attached is a copy of the current CATALOGS file.

https://jira.hpdd.intel.com/secure/attachment/18394/CATALOGS.20150707

The llog_reader command failed with the following error:

Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

So please advice how to get that output and what needs to be done to get rid of that "recs_buf error"

We removed the OLD CATALOGS and OBJECTS directories, which we renamed about a month ago. Today customer site has scheduled for maintenance so it would be nice to have next step befre we run out of this maintenance scheulde.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 07/Jul/15 ]

Thanks for the CATALOGS binary file, but what about the :
_ "lctl dl" cmd output on MDS
_ "ls -li" of MDS's /OBJECTS directory output, when ldiskfs-mounted
other infos I already asked you to also provide ??
These infos are also required to understand what is going on.

Concerning the "recs_buf error" from "llog_reader CATALOGS" cmd, they should come from the fact "llog_reader" don't know know howto interpret CATALOGS formet, but I can manage to get interesting stuff from CATALOGS binary file.

Comment by Manish Patel (Inactive) [ 10/Jul/15 ]

Bruno requested rest the data.

Comment by Manish Patel (Inactive) [ 10/Jul/15 ]

Hi Bruno,

Here are the other requested files.

https://jira.hpdd.intel.com/secure/attachment/18414/ddn_sr44330_lctl_dl.txt
https://jira.hpdd.intel.com/secure/attachment/18415/ddn_sr44330_ls_objects.txt
https://jira.hpdd.intel.com/secure/attachment/18416/llog_reader.txt

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 20/Jul/15 ]

Hello Manish,
According to your latest provided files/infos, here is what I think :
_ the llog_reader.txt file, shows the the oi.16 file as being huge, and as we already discussed before, how to cure this has been already described to you in DDN-77.

_ the llog_reader.txt file, also again highlights the llog_reader error when used on the CATALOG file that is not of expected LLOG format. I will push a patch soon to have llog_reader gracefully handle such case.

_ the ddn_sr44330_ls_objects.txt file shows that there are 336 "100uvw:68b06xyz" small files (2x per OST and half of them as the only referenced in CATALOG file), in OBJECTS/ dir. All are dated "Jun 9 10:56". These should be the normal OSP sync LLOGs catalogs.

_ the ddn_sr44330_ls_objects.txt file also shows 190 more recent and others files of reasonable size (all are < 3MBs). Are these files te one you suspect to leak and to have consumed a huge space in your MDT before you have removed them ?

_ last, do you still see the "(llog_obd.c:320:cat_cancel_cb())
Cannot find handle for log 0x2793959b: -116" original messages for this ticket ?

Comment by Manish Patel (Inactive) [ 20/Jul/15 ]

Hi Bruno,

We have already cured the hug oi.16 file issues and it's shrinked to normal size. Regarding rest of the logs files related questiosn I will have to check with customer to get their input and I will update this threads.

Thank You,
Manish

Comment by Gerrit Updater [ 20/Jul/15 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/15654
Subject: LU-6612 utils: strengthen llog_reader vs wrong format/header
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bdc32b6164c08484b2f88b1a3a0bac7b9f223dea

Comment by Manish Patel (Inactive) [ 23/Jul/15 ]

Hi Bruno,

It seems that any file over a few hours old in the OBJECTS directory is leaked. It would be nice if you can take a look at the ls -lah output from the MDT OBJECTS directory in the attached file "find_exec_ls_alh_manish.txt file" to this case for a more complete history of the number of files, and their timestamp and size. The file lists most of the nearly 74k files that were removed during our last maintenance window.

https://jira.hpdd.intel.com/secure/attachment/18474/find_exec_ls_alh_manish.txt

The most recent llog_obd.c:320:cat_cancel_cb() error logged was on 7/9 for a different log:

Jul 8 00:06:11 dc2mds01 kernel: : LustreError: 8949:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x27924bbc: -116
Jul 8 00:06:11 dc2mds01 kernel: : LustreError: 8949:0:(llog_obd.c:320:cat_cancel_cb()) Skipped 26419 previous similar messages

Thank You,
Manish

Comment by Manish Patel (Inactive) [ 30/Jul/15 ]

Hi Bruno,

Customer is scheduled to have maintenance window on upcoming next Tuesday, so can you take a look at that file which I listed in my previous post and see if you need anything else from the customer to be collected during the maintenance downtime.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 31/Jul/15 ]

Hello Manish,
In the last find_exec_ls_alh_manish.txt you have provided, I don't see any files dated for the last 2 June/July monthes ...
Also, it looks much more different than the previous ddn_sr44330_ls_objects.txt content I have already described!
Can you detail how/when these 2x "ls" cmd reports have been done ?

Comment by Manish Patel (Inactive) [ 31/Jul/15 ]

Hi Bruno,

Customer took the MDT backup and loaded the MDT data on their test cluster and mounted with ldiskfs and those data are captured on May-12-2015. The command used to get the file data "find_exec_ls_alh_manish.txt " is as below:

find OBJECTS -name "[0-9][0-9]*:[0-9a-f][0-9a-f]*" -exec ls -lah {} \;

Let me know if you need anything else to troubleshoot further.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 03/Aug/15 ]

Humm I was also asking for some feedback about the differences I have found in previous ddn_sr44330_ls_objects.txt file/infos ...
BTW, if OBJECTS files still appear to leak on customer site, would be cool if you can attach some (old + recent) of them. And with a recent version of the CATALOGS file too.

Comment by Manish Patel (Inactive) [ 04/Aug/15 ]

Hi Bruno,

Customer collected a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory. Here is new data which you requested.

http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar

Let us know if you need anything else.

Thank You,
Manish

Comment by Manish Patel (Inactive) [ 20/Aug/15 ]

Hi Bruno,

Any updates on the data which I uploaded in the previous post.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 01/Sep/15 ]

Manish,
Sorry to be late on this, but unfortunately the http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar returns me "ERROR 404: Not Found." when I try to download it. Can you verify what is going on? Thanks in advance.

Comment by Manish Patel (Inactive) [ 01/Sep/15 ]

Hi Bruno,

I have uploaded that logs to Intel FTP site with directory name to this Jira Case LU-6612. Let me know if you are able to get those logs from there.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 03/Sep/15 ]

Adding recent email/answer sent to concerned people inclusing Manish :
=======================================================
I have started to work on the latest infos from the site that Manish has provided.
I am not fully done, because LLOGs analysis is quite a long process, but my 1st results seem to indicate that the files growing/leaking in OBJECTS directory are OSP Sync logs. If further analysis of all LLOGs confirms this, that means that either some of the osp_sync_thread’s are stuck or that some chown & unlink operations on MDT are never commited by some OSTs, or last, that some OSTs are stopped.
LU-5297, LU-6687 and LU-7079 may be related.

Since Manish is in copy, I am using this email to ask him more infos :
_ does the site have 168 OSTs configured?
_ do you again confirm that none of the OSTs are stopped?
_ can you check osp_sync_thread’s state on MDS?
_ since how many times now the concerned Lustre filesystem has been started? Would be nice to get the exact date.
_ do you confirm site is running with 2.1.6?

Thanks again for your help,
============================================================

Comment by Bruno Faccini (Inactive) [ 03/Sep/15 ]

I have done more work on latest provided datas, and here are the results :
_ CATALOGS file points to 168 Catalog files, presumably one for each OST/osp_sync log
_ each of these catalogs only contain 5-7 valid+sparse records pointing to plain LLOG files.
_ number of Catalog files (168) + number of all of their referenced LLOG files = number of files in OBJECTS dir
_ I have not checked all of the LLOG referenced files, but all I had a look to only contain a few+sparsed records of MDS_UNLINK_REC type.

So LU-7079 seems to look as the best choice/candidate, but I need to verify if it also applies to 2.1.6.

Comment by Manish Patel (Inactive) [ 03/Sep/15 ]

Hi Bruno,

-> Yes this site is having 168 OST's configured.
-> Yes none of the OST's are stopped.
-> yes site is running with 2.1.6

I will have to get the rest of the data from the customer and post it once I get those details. So how do I check the "osp_sync_thread" state on MDS", is there any command which I can use it.

Thank You,
Manish

Comment by Peter Jones [ 03/Sep/15 ]

Bruno

LU-7079 was only introduced with Lustre 2.4.x so that cannot be the exact issue at play on this 2.1.6 deployment so I think that discounts the request to check osp_sync_thread’s state on the MDS. Are the other items still required? Anything else that DDN should gather to help support your ongoing investigation?

Peter

Comment by Bruno Faccini (Inactive) [ 03/Sep/15 ]

Since it is established that our 2.1 problem could not be related/linked to LU-7079, and also that my previous request to check "osp_sync_thread state on MDS" is inacurate.
Would it be possible for you to provide MDS and OSSs syslogs/Console history for the past 2/3 months time period ?
On the other hand I am parsing the 2.1.6 Lustre source code to find any code-path that may explain such leakage.

Comment by Manish Patel (Inactive) [ 03/Sep/15 ]

Hi Bruno,

I have uploaded the logs to intel FTP under LU-6612 dir with file name as "2015-09-03-SR44330_es_lustre_showall_2015-09-03_151236.tar.bz2"

File system has be restarted so many times and it's gone be hard to get all those data but here is recent activity.

06-03-2015 : File System is taken down for oi.16 shrink procedure and restarted during scheduled maintenance window.
06-07-2015 - 06-09-2015 : The File System MDS experienced issues which required the MDT to be unmounted multiple times. During this unplanned outage is when Andreas instructed us to renamed the OBJECTS directory on the MDT.
07-07-2015 : File System was restarted during scheduled maintenance window
08-04-2015 : File System was taken offline during regular scheduled maintenance window to collect MDT OBJECTS and CATALOGS files information for Intel.
09-02-2015 : File System was restarted to accommodate a MDT backup.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 04/Sep/15 ]

Thanks to provide the logs.
Looks like OSSs/OSTs quite frequently report the following errors/msgs sequence :

messages:Sep  2 04:05:50 dc2oss12 kernel: : LustreError: 14888:0:(ldlm_resource.c:1090:ldlm_resource_get()) lvbo_init failed for resource 110829994: rc -2
messages:Sep  2 04:05:50 dc2oss12 kernel: : LustreError: 7212:0:(filter.c:3136:__filter_oa2dentry()) dc2-OST0075: filter_sync on non-existent object: 110829994:0 
messages:Sep  2 04:05:50 dc2oss12 kernel: : LustreError: 7212:0:(ost_handler.c:1648:ost_blocking_ast()) Error -2 syncing data on lock cancel

I am presently analyzing the related code path to see if it can be the cause of the leaking MDS_UNLINK_REC records in the OSP-SYNC LLOGs.

Comment by Bruno Faccini (Inactive) [ 07/Sep/15 ]

There also the following error msgs from filter_destroy() which could be much more related :

$ grep filter_destroy es_lustre_showall_2015-09-03_151236/*oss*/logs/messages-* 
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:23:13 dc2oss01 kernel: : [<ffffffffa0ca5a24>] filter_destroy+0xf24/0x13d0 [obdfilter]
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:24:56 dc2oss01 kernel: : LustreError: 15252:0:(filter.c:4267:filter_destroy()) error on commit, err = -30
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:24:56 dc2oss01 kernel: : LustreError: 15252:0:(filter.c:4267:filter_destroy()) Skipped 1 previous similar message
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:45 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:45 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:57 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:57 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy()) Skipped 7 previous similar messages
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:59 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:59 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy()) Skipped 9 previous similar messages
es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:20:01 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy())  dc2-OST0006: can not find olg of group 0
es_lustre_showall_2015-09-03_151236/dc2oss02/logs/messages-20150816:Aug 15 10:56:03 dc2oss02 kernel: : [<ffffffffa0c2f218>] ? filter_destroy+0x718/0x13d0 [obdfilter]
es_lustre_showall_2015-09-03_151236/dc2oss08/logs/messages-20150823:Aug 18 06:29:24 dc2oss08 kernel: : [<ffffffffa0cb2218>] filter_destroy+0x718/0x13d0 [obdfilter]
es_lustre_showall_2015-09-03_151236/dc2oss10/logs/messages-20150809:Aug  5 14:31:51 dc2oss10 kernel: : LustreError: 15496:0:(filter.c:1621:filter_destroy_internal()) destroying objid 94281010 ino 36943382 nlink 1 count 3

and could be the start-point of MDS_UNLINK_REC records and containing LLOG plain file leakage.

Since problem seems to occur on any OST, I have decided to concentrate on OST0000, and this OST's Catalog presently only contains 5 references/records of plain LLOGs :

Header size : 8192
Time : Tue Jun  9 07:48:06 2015
Number of records: 5
Target uuid :
-----------------------
#01 (064)ogen=68B06AFD name=0:1048793
#102 (064)ogen=8E05AC60 name=0:1049057
#103 (064)ogen=B52DE2C3 name=0:1049144
#104 (064)ogen=EEE3D5B name=0:1049420
#193 (064)ogen=B9CE5F6A name=0:1049449

if we use the 1st one, it contains (like the others) a few valid records :

rec #1 type=10612404 len=40   <<<<< mainly MDS_UNLINK_REC records
rec #2 type=10612404 len=40
rec #3 type=10612404 len=40
rec #4 type=10612404 len=40
rec #5 type=10612404 len=40
rec #6 type=10612404 len=40
rec #7 type=10612404 len=40
rec #8 type=10612404 len=40
rec #9 type=10612404 len=40
rec #10 type=10612404 len=40
rec #11 type=10612404 len=40
rec #12 type=10612404 len=40
rec #13 type=10612404 len=40
rec #14 type=10612404 len=40
rec #15 type=10612404 len=40
rec #19 type=10612404 len=40
rec #20 type=10612404 len=40
rec #21 type=10612404 len=40
rec #24 type=10612404 len=40
rec #27 type=10612404 len=40
rec #28 type=10612404 len=40
rec #29 type=10612404 len=40
rec #30 type=10612404 len=40
rec #32 type=10612404 len=40
rec #33 type=10612404 len=40
rec #37 type=10612404 len=40
rec #38 type=10612404 len=40
rec #39 type=10612404 len=40
rec #42 type=10612404 len=40
rec #43 type=10612404 len=40
rec #44 type=10612404 len=40
rec #45 type=10612404 len=40
rec #46 type=10612404 len=40
rec #47 type=10612404 len=40
rec #52 type=10612404 len=40
rec #53 type=10612404 len=40
rec #54 type=10612404 len=40
rec #55 type=10612404 len=40
rec #56 type=10612404 len=40
rec #57 type=10612404 len=40
rec #58 type=10692401 len=56 <<<<< but also a MDS_SETATTR64_REC
rec #59 type=10612404 len=40
rec #60 type=10612404 len=40
rec #61 type=10612404 len=40
rec #62 type=10612404 len=40
rec #63 type=10612404 len=40
rec #64 type=10612404 len=40
rec #65 type=10612404 len=40
rec #66 type=10612404 len=40
rec #67 type=10612404 len=40
rec #68 type=10612404 len=40
rec #69 type=10612404 len=40
rec #70 type=10612404 len=40
rec #71 type=10612404 len=40
rec #72 type=10612404 len=40
rec #73 type=10612404 len=40
rec #74 type=10612404 len=40
rec #75 type=10612404 len=40
rec #76 type=10612404 len=40
rec #77 type=10612404 len=40
rec #78 type=10612404 len=40
rec #79 type=10612404 len=40
rec #80 type=10612404 len=40
rec #81 type=10612404 len=40
rec #82 type=10612404 len=40
rec #83 type=10612404 len=40
rec #84 type=10612404 len=40
rec #85 type=10612404 len=40
rec #86 type=10612404 len=40
rec #87 type=10612404 len=40
rec #88 type=10612404 len=40
rec #89 type=10612404 len=40

Thus, could you check if OBJID 0x532cb79 (87214969) still exists on OST0000 and also if OBJID 0x532ccc6 (87215302) still exists on OST0000, and the later with 0xb8517 (754967) uid and 0xd2 (210) gid ?
To do so, you can use debugfs on associated device and look for OBJIDs with the following commands :

debugfs -R "stat O/0/d25/87214969" /dev/...
debugfs -R "stat O/0/d6/87215302" /dev/...

Would be also very helpful if you could again provide a similar package containing MDS's CATALOGS and OBJECTS/* files, for me to understand problem's evolution/consequences based on the recently provided logs content.

Comment by Manish Patel (Inactive) [ 08/Sep/15 ]

Hi Bruno,

Can you provide more details on what exactly you need for

"Would be also very helpful if you could again provide a similar package containing MDS's CATALOGS and OBJECTS/* files, for me to understand problem's evolution/consequences based on the recently provided logs content."

Also do you need any out with 0xb8517 (754967) uid and 0xd2 (210) gid, then let us know the command to gather those data.

I will ask the customer to get the debugfs output and update the post.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 09/Sep/15 ]

> Can you provide more details on what exactly you need for
Exactly the same stuff/infos than what you have provided and commented in this ticket on 04/Aug/15, but from now.

> Also do you need any out with 0xb8517 (754967) uid and 0xd2 (210) gid, then let us know the command to gather those data.
No that will be part of the infos returned for OBJID 0x532ccc6 (87215302).

Comment by Manish Patel (Inactive) [ 09/Sep/15 ]

Hi Bruno,

Here is the debugfs output which you requested and rest of other data we can get during next maintenance window happening in October.

[root@dc2oss01 ~]# debugfs -R "stat O/0/d25/87214969" /dev/mapper/ost_dc2_0 2>&1 | tee /tmp/sr44330_87214969.log
debugfs 1.42.7.wc1 (12-Apr-2013)
O/0/d25/87214969: File not found by ext2_lookup
[root@dc2oss01 ~]# debugfs -R "stat O/0/d6/87215302" /dev/mapper/ost_dc2_0 2>&1 | tee /tmp/sr44330_87215302.log
debugfs 1.42.7.wc1 (12-Apr-2013)
Inode: 94402427 Type: regular Mode: 0666 Flags: 0x80000
Generation: 1758127277 Version: 0x00000000:21734617
User: 754967 Group: 210 Size: 2914894
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 5696
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x55770077:00000000 -- Tue Jun 9 11:04:23 2015
atime: 0x55770086:00000000 -- Tue Jun 9 11:04:38 2015
mtime: 0x55770077:00000000 -- Tue Jun 9 11:04:23 2015
crtime: 0x5577003b:611b92d8 -- Tue Jun 9 11:03:23 2015 Size of extra inode fields: 28 Extended attributes stored in inode body:
fid = "fe 0d 00 5b 76 f2 10 02 00 00 00 00 00 00 00 00 c6 cc 32 05 00 00 00 00
00 00 00 00 00 00 00 00 " (32)
fid: objid=87215302 seq=0 parent=[0x210f2765b000dfe:0x0:0x0] stripe=0
EXTENTS:
(0-255):2009370624-2009370879, (256-711):2009379072-2009379527

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 10/Sep/15 ]

Thanks Manish,
This is really interesting to see that for 2 Unlink and Setattr leaked records, respectivelly the 1st is no longer present (but may have been simply unlinked as expected!), and the 2nd is present and has its uid/gid set as expected with the values in its LLOG record.
Due to these last infos, I have worked more looking into the 2.1.6 source code and I think I have a good idea about how LLOG records may leak now. Based on this, I will now try to setup a reproducer along to push a debug patch.
More to come soon now.

Comment by Gerrit Updater [ 11/Sep/15 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/16373
Subject: LU-6612 obdfilter: more debug upon filter_find_olg() failure
Project: fs/lustre-release
Branch: b2_1
Current Patch Set: 1
Commit: 28b1448ac3f757c7df42cd15208cd36e4b20d4eb

Comment by Manish Patel (Inactive) [ 11/Sep/15 ]

Hi Bruno,

So is that patch is with the fix or it's just the debug patch.

Thank You,
Manish

Comment by Peter Jones [ 12/Sep/15 ]

Manish

Based on the comment above and the commit message it seems that this is a debug patch

Peter

Comment by Manish Patel (Inactive) [ 14/Sep/15 ]

Hi Bruno,

Were you able to create a reproducer and found anything?

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 15/Sep/15 ]

Hello Manish,
Peter is right, my patch is only to give more infos on the code paths where I suspect we can leak LLOG records finally causing the orphan LLOG files accumulating under OBJECTS directory.
I am presently working to reproduce running with my patch, will let you know soon now about any progress/success.

Comment by Bruno Faccini (Inactive) [ 18/Sep/15 ]

Unfortunately I am presently unable to reproduce even if trying to create scenarios likely to generate the previously suspected msgs ...
On the other hand working on the reproducer has permitted me to find other paths in code where there seems to be the possibility of records leakage in MDS<->OSTs SYNC LLOGs. Recovery code may also be concerned because the MDS should be able to ask OSTs to replay such recorded operations upon restart ...
Due to this I am presently writing an enhanced version of my previous debug patch, if its further testing will be successful, do you think there is a chance that it could be installed on-site? MDS/OSSs are targeted.

Also about the 2.1.6 version you have claimed for this problem, the logs yopu have provided state "Lustre: Build Version: EXAScaler-ddn1.0--PRISTINE-2.6.32-358.11.1.el6_lustre.es359.devel.x86_64", can you give me more details about the exact content (Tag it is based on, additional patches, ...) ?

And again, as I have already requested, in my 2 previous updates on 7+9/Sep/15, can you provide « Exactly the same stuff/infos than what you have provided and commented in this ticket on 04/Aug/15, but from now » ???

Comment by Manish Patel (Inactive) [ 18/Sep/15 ]

Hi Bruno,

Yes if you can provide that new debug patch in few days then we can build new RPMS and test it and install it at next maintenance window, which is most likely to be happening on October/06 and at same time we can collect those previous set of data which was provided on 04/Aug/15.

Here is more details about this Lustre Build Patches applied to that Build.

https://jira.hpdd.intel.com/secure/attachment/18930/IU_Lustre_Build_Patches_06_08_2015.txt

Thank You,
Manish

Comment by Manish Patel (Inactive) [ 05/Oct/15 ]

Hi Bruno,

Let us know if you need anything else other than below listed data,

a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory.

Customer is having a downtime tomorrow and if you have that enhanced version of patch ready then let us know.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 06/Oct/15 ]

Hello Manish,
> a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory.
Yes, that will be very helpful, but even more if you can get the same infos after filesystem has been fully stopped, and again after filesystem has been fully re-started (and any recovery is completed).

If you could also ensure to run the whole restart with the default (ioctl neterror warning error emerg ha config console lfsck) debug mask set, on MDS, and then take a full trace at the end of restart/recovery, that would also be very helpful.

All MDS/OSSs syslogs/messages taken sometimes after restart will also be required.

Last, you can add/use my debug patch available at http://review.whamcloud.com/16373 for Lustre version to be used for the restart, it will print more debug msgs/infos on some identified paths in code that may cause the unlink/setattr LLOG records leakage.

Comment by Bruno Faccini (Inactive) [ 06/Oct/15 ]

If customer agrees to set the MDS debug mask as I have requested (and also to grow the size of the log buffer), I need to better detail the procedure to apply it.
To be sure that you will catch and log the whole restart and recovery process on the MDS node, you will need to first load the libcfs module by running the "modprobe libcfs" command and change/verify the debug log mask (echo "ioctl neterror warning error emerg ha config console lfsck" >/proc/sys/lnet/debug) and buffer size (echo 8192 >/proc/sys/lnet/debug_mb, for 8GB but you can use less if MDS memory size is small), before mounting the MDT.
Then you will need to dump the debug log to a disk file using the "lctl dk /tmp/lustre_debug.log" command, right after the filesystem start has completed.

Also, you can reset the mask and buffer log size, to their original values, after filesystem has been fully started.

Comment by Manish Patel (Inactive) [ 06/Oct/15 ]

Hi Bruno,

We need more clarification about this part.

"Yes, that will be very helpful, but even more if you can get the same infos after filesystem has been fully stopped, and again after filesystem has been fully re-started (and any recovery is completed)."

So do you mean that after full stopped we collect the data using ldiskfs and then get back file system online and once the recovery is completed we take down the file system again and mount it with ldiskfs and collect those same set of data? Or you have other way around to collect those data while file system is up and online using 'debugfs' or other method. So can you please clarify about this part and provide detail steps to get those data.

Also we won't be able to deploy the patch today since it has to go through their test lab testing and probably it will be deployed at next maintenance window.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 06/Oct/15 ]

You will not need to stop the filesystem again, you can safely mount the MDT as ldiskfs, but from the same node/MDS only, and this in parallel of normal operations, until you don't modify anything on it. You can also use debugfs if you want, but dump'ing all the OBJECTS/ dir files could be a long process.

Ok, for the patch I will let you know if I modify/enhance it in between.

Comment by Manish Patel (Inactive) [ 06/Oct/15 ]

Hi Bruno,

Here are the logs which you requested.

http://ddntsr.com/ftp/2015-10-06-SR44330_20151006.tar

It contains the following files:

SR44330_ls-li_MDT_OBJECTS_20151006.txt
SR44330_MDT_CATALOGS_201501006.txt
SR44330_MDT_OBJECTS_20151006.tar.gz
lustre_debug.20151006.log
SR44330_ls-li_MDT_OBJECTS_20151006b.txt
SR44330_MDT_CATALOGS_201501006b.txt
SR44330_MDT_OBJECTS_20151006b.tar.gz

The files with a suffix of *b.txt and *b.tar.gz were collected after customer brought the file system back online.

Here are the syslogs if needed.
http://ddntsr.com/ftp/2015-10-06-SR44330_es_lustre_showall_2015-10-06_125006.tar.bz2

Let us know if you need anything else.

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 08/Oct/15 ]

Thanks Manish, I am presently reviewing all of these new infos, will let you know asap about any progress.

Comment by Bruno Faccini (Inactive) [ 21/Oct/15 ]

Sorry, it took me some time to parse all these new datas ...
The MDS debug log taken upon FS restart, when it is expected for the Unlink/Setattr LLOG records to be replayed between MDS and each OSTs, is particularly interesting because it shows that the concerned LLOG files of each OSTs are parsed. More debug-level are missing to understand why, but their records are not canceled/removed, thus LLOG files are kept.
Thus I have decided to again try to reproduce problem in-house using the following scenario :
_ Client creates a file
_ deactivate MDT/OST connexion
_ Client remove the file
_ stop MDT
_ set full debug
_ restart MDT
And this has permitted me to find that even if each MDS/OST Unlink/Setattr LLOG records are parsed/replayed, they are effectivelly not/never canceled :

00002000:00000001:27.0:1445357157.777095:0:5009:0:(filter_log.c:266:filter_recov_log_mds_ost_cb()) Process entered
00002000:00000001:27.0:1445357157.777097:0:5009:0:(filter_log.c:166:filter_recov_log_unlink_cb()) Process entered
00002000:00000010:27.0:1445357157.777099:0:5009:0:(filter_log.c:169:filter_recov_log_unlink_cb()) slab-alloced '((oa))': 208 at ffff8808362f4cb0.
00002000:00000001:27.0:1445357157.777103:0:5009:0:(filter.c:4119:filter_destroy()) Process entered
00002000:00000001:27.0:1445357157.777105:0:5009:0:(filter_capa.c:127:filter_auth_capa()) Process entered
00002000:00000001:27.0:1445357157.777107:0:5009:0:(filter_capa.c:141:filter_auth_capa()) Process leaving (rc=0 : 0 : 0)
00002000:00000002:27.0:1445357157.777110:0:5009:0:(filter.c:4133:filter_destroy()) lustre-OST0000: filter_destroy(group=0,oid=2)
00002000:00000001:27.0:1445357157.777113:0:5009:0:(filter.c:1474:filter_fid2dentry()) Process entered
00002000:00000002:27.0:1445357157.777116:0:5009:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d2/2
00002000:00000002:27.0:1445357157.777177:0:5009:0:(filter.c:1518:filter_fid2dentry()) got child objid 2: ffff880834151740, count = 1
00002000:00000001:27.0:1445357157.777180:0:5009:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446612167547754304 : -131906161797312 : ffff880834151740)
00002000:00000002:27.0:1445357157.777183:0:5009:0:(filter.c:4142:filter_destroy()) destroying non-existent object 2:0
00002000:00000001:27.0:1445357157.777185:0:5009:0:(filter.c:4161:filter_destroy()) Process leaving via cleanup (rc=18446744073709551614 : -2 : 0xfffffffffffffffe)
00002000:00000002:27.0:1445357157.777187:0:5009:0:(filter.c:221:f_dput()) putting 2: ffff880834151740, count = 0
00002000:00000001:27.0:1445357157.777191:0:5009:0:(lustre_quota.h:653:lquota_adjust()) Process entered
00040000:00000001:27.0:1445357157.777193:0:5009:0:(quota_master.c:577:filter_quota_adjust()) Process entered
00040000:00000001:27.0:1445357157.777195:0:5009:0:(quota_master.c:580:filter_quota_adjust()) Process leaving (rc=0 : 0 : 0)
00002000:00000001:27.0:1445357157.777197:0:5009:0:(lustre_quota.h:657:lquota_adjust()) Process leaving (rc=0 : 0 : 0)
00002000:00000010:27.0:1445357157.777198:0:5009:0:(filter_log.c:201:filter_recov_log_unlink_cb()) slab-freed '((oa))': 208 at ffff8808362f4cb0.
00002000:00000001:27.0:1445357157.777200:0:5009:0:(filter_log.c:203:filter_recov_log_unlink_cb()) Process leaving (rc=0 : 0 : 0)
00002000:00000001:27.0:1445357157.777202:0:5009:0:(filter_log.c:313:filter_recov_log_mds_ost_cb()) Process leaving (rc=0 : 0 : 0)

and according to the code in b2_1, this should occur by calling either llog_cancel() or filter_cancel_cookies_cb() directly or as a callback, but only if OBD_MD_FLCOOKIE is not set in obdo valid flags, and this stands for both the unlink and setattr records case :

int filter_setattr_internal(struct obd_export *exp, struct dentry *dentry,
                            struct obdo *oa, struct obd_trans_info *oti)
{
        unsigned int orig_ids[MAXQUOTAS] = {0, 0};
        struct llog_cookie *fcc = NULL;
        struct filter_obd *filter;
        int rc, err, sync = 0;

....................

        if (oa->o_valid & OBD_MD_FLCOOKIE) {
                OBD_ALLOC(fcc, sizeof(*fcc));
                if (fcc != NULL)
                        *fcc = oa->o_lcookie;
        }

...................

        if (oa->o_valid & OBD_MD_FLFLAGS) {
                rc = fsfilt_iocontrol(exp->exp_obd, dentry,
                                      FSFILT_IOC_SETFLAGS, (long)&oa->o_flags);
        } else {
                rc = fsfilt_setattr(exp->exp_obd, dentry, handle, &iattr, 1);
                if (fcc != NULL)
                        /* set cancel cookie callback function */
                        sync = fsfilt_add_journal_cb(exp->exp_obd, 0, handle,
                                                     filter_cancel_cookies_cb,    <<<<<<<<<<<<<<<<<<<<<<<
                                                     fcc);
        }

...................

        if (sync) {
                filter_cancel_cookies_cb(exp->exp_obd, 0, fcc, rc);
                fcc = NULL;
        }

..........................

int filter_destroy(struct obd_export *exp, struct obdo *oa,
                   struct lov_stripe_md *md, struct obd_trans_info *oti,
                   struct obd_export *md_exp, void *capa)
{
        unsigned int qcids[MAXQUOTAS] = {0, 0};
        struct obd_device *obd;
        struct filter_obd *filter;
        struct dentry *dchild = NULL, *dparent = NULL;
        struct lustre_handle lockh = { 0 };
        struct lvfs_run_ctxt saved;
        void *handle = NULL;
        struct llog_cookie *fcc = NULL;
        int rc, rc2, cleanup_phase = 0, sync = 0;
        struct iattr iattr;
        unsigned long now;
        ENTRY;

        rc = filter_auth_capa(exp, NULL, oa->o_seq,
                              (struct lustre_capa *)capa, CAPA_OPC_OSS_DESTROY);
        if (rc)
                RETURN(rc);

        obd = exp->exp_obd;
        filter = &obd->u.filter;

        push_ctxt(&saved, &obd->obd_lvfs_ctxt, NULL);
        cleanup_phase = 1;

        CDEBUG(D_INODE, "%s: filter_destroy(group="LPU64",oid="
               LPU64")\n", obd->obd_name, oa->o_seq, oa->o_id);

        dchild = filter_fid2dentry(obd, NULL, oa->o_seq, oa->o_id);
        if (IS_ERR(dchild))
                GOTO(cleanup, rc = PTR_ERR(dchild));
        cleanup_phase = 2;

        if (dchild->d_inode == NULL) {
                CDEBUG(D_INODE, "destroying non-existent object "POSTID"\n",
                       oa->o_id, oa->o_seq);
                /* If object already gone, cancel cookie right now */
                if (oa->o_valid & OBD_MD_FLCOOKIE) {
                        struct llog_ctxt *ctxt;
                        struct obd_llog_group *olg;

                        olg = filter_find_olg(obd, oa->o_seq);
                        if (!olg) {
                                CERROR("%s: can not find olg of group %d "
                                       "for objid "LPU64"\n", obd->obd_name,
                                      (int)oa->o_seq, oa->o_id);
                               GOTO(cleanup, rc = PTR_ERR(olg));
                        }
                        fcc = &oa->o_lcookie;
                        ctxt = llog_group_get_ctxt(olg, fcc->lgc_subsys + 1);
                        llog_cancel(ctxt, NULL, 1, fcc, 0);        <<<<<<<<<<<<<<<
                        llog_ctxt_put(ctxt);
                        fcc = NULL; /* we didn't allocate fcc, don't free it */
                }
                GOTO(cleanup, rc = -ENOENT);
        }

        rc = filter_prepare_destroy(obd, oa->o_id, oa->o_seq, &lockh);
        if (rc)
                GOTO(cleanup, rc);

        /* Our MDC connection is established by the MDS to us */
        if (oa->o_valid & OBD_MD_FLCOOKIE) {
                OBD_ALLOC(fcc, sizeof(*fcc));
                if (fcc != NULL)
                        *fcc = oa->o_lcookie;   <<<<<<<<<<<<<<<<<<<<<<<<<<<<
        }

        /* we're gonna truncate it first in order to avoid possible deadlock:
         *      P1                      P2
         * open trasaction      open transaction
         * down(i_zombie)       down(i_zombie)
         *                      restart transaction
         * (see BUG 4180) -bzzz
         *
         * take i_alloc_sem too to prevent other threads from writing to the
         * file while we are truncating it. This can cause lock ordering issue
         * between page lock, i_mutex & starting new journal handle.
         * (see bug 20321) -johann
         */
        now = jiffies;
        down_write(&dchild->d_inode->i_alloc_sem);
        LOCK_INODE_MUTEX(dchild->d_inode);
        fsfilt_check_slow(exp->exp_obd, now, "i_alloc_sem and i_mutex");

        /* VBR: version recovery check */
        rc = filter_version_get_check(exp, oti, dchild->d_inode);
        if (rc) {
                UNLOCK_INODE_MUTEX(dchild->d_inode);
                up_write(&dchild->d_inode->i_alloc_sem);
                GOTO(cleanup, rc);
        }

        handle = fsfilt_start_log(obd, dchild->d_inode, FSFILT_OP_SETATTR,
                                  NULL, 1);
        if (IS_ERR(handle)) {
                UNLOCK_INODE_MUTEX(dchild->d_inode);
                up_write(&dchild->d_inode->i_alloc_sem);
                GOTO(cleanup, rc = PTR_ERR(handle));
        }

        /* Locking order: i_mutex -> journal_lock -> dqptr_sem. LU-952 */
        ll_vfs_dq_init(dchild->d_inode);

        iattr.ia_valid = ATTR_SIZE;
        iattr.ia_size = 0;
        rc = fsfilt_setattr(obd, dchild, handle, &iattr, 1);
        rc2 = fsfilt_commit(obd, dchild->d_inode, handle, 0);
        UNLOCK_INODE_MUTEX(dchild->d_inode);
        up_write(&dchild->d_inode->i_alloc_sem);
        if (rc)
                GOTO(cleanup, rc);
        if (rc2)
                GOTO(cleanup, rc = rc2);

        /* We don't actually need to lock the parent until we are unlinking
         * here, and not while truncating above.  That avoids holding the
         * parent lock for a long time during truncate, which can block other
         * threads from doing anything to objects in that directory. bug 7171 */
        dparent = filter_parent_lock(obd, oa->o_seq, oa->o_id);
        if (IS_ERR(dparent))
                GOTO(cleanup, rc = PTR_ERR(dparent));
        cleanup_phase = 3; /* filter_parent_unlock */

        LOCK_INODE_MUTEX(dchild->d_inode);
        handle = fsfilt_start_log(obd, dparent->d_inode,FSFILT_OP_UNLINK,oti,1);
        if (IS_ERR(handle)) {
                UNLOCK_INODE_MUTEX(dchild->d_inode);
                GOTO(cleanup, rc = PTR_ERR(handle));
        }
        cleanup_phase = 4; /* fsfilt_commit */

        /* Quota release need uid/gid of inode */
        obdo_from_inode(oa, dchild->d_inode, NULL, OBD_MD_FLUID|OBD_MD_FLGID);

        filter_fmd_drop(exp, oa->o_id, oa->o_seq);

        /* this drops dchild->d_inode->i_mutex unconditionally */
        rc = filter_destroy_internal(obd, oa->o_id, oa->o_seq, dparent, dchild);

        EXIT;
cleanup:
        switch(cleanup_phase) {
        case 4:
                if (fcc != NULL)
                        sync = fsfilt_add_journal_cb(obd, 0, oti ?
                                                     oti->oti_handle : handle,
                                                     filter_cancel_cookies_cb, <<<<<<<<<<<<<<<<<<<<<<<<<<<<
                                                     fcc);
                /* If add_journal_cb failed, then filter_finish_transno
                 * will commit the handle and we will do a sync
                 * on commit. then we call callback directly to free
                 * the fcc.
                 */
                rc = filter_finish_transno(exp, NULL, oti, rc, sync);
                if (sync) {
                        filter_cancel_cookies_cb(obd, 0, fcc, rc); <<<<<<<<<<<<<<<<<<<<<<<<<<
                        fcc = NULL;
                }
                rc2 = fsfilt_commit(obd, dparent->d_inode, handle, 0);
                if (rc2) {
                        CERROR("error on commit, err = %d\n", rc2);
                        if (!rc)
                                rc = rc2;
                } else {
                        fcc = NULL;
                }
        case 3:
                filter_parent_unlock(dparent);
        case 2:
                filter_fini_destroy(obd, &lockh);

                f_dput(dchild);
                if (fcc != NULL)
                        OBD_FREE(fcc, sizeof(*fcc));
        case 1:
                pop_ctxt(&saved, &obd->obd_lvfs_ctxt, NULL);
                break;
        default:
                CERROR("invalid cleanup_phase %d\n", cleanup_phase);
                LBUG();
        }

        /* trigger quota release */
        qcids[USRQUOTA] = oa->o_uid;
        qcids[GRPQUOTA] = oa->o_gid;
        rc2 = lquota_adjust(filter_quota_interface_ref, obd, qcids, NULL, rc,
                            FSFILT_OP_UNLINK);
        if (rc2)
                CERROR("filter adjust qunit! (rc:%d)\n", rc2);
        return rc;
}

then, having a look to the code of callers :

/* Callback for processing the unlink log record received from MDS by
 * llog_client_api. */
static int filter_recov_log_unlink_cb(struct llog_ctxt *ctxt,
                                      struct llog_rec_hdr *rec,
                                      struct llog_cookie *cookie)
{
        struct obd_export *exp = ctxt->loc_obd->obd_self_export;
        struct llog_unlink_rec *lur;
        struct obdo *oa;
        obd_count count;
        int rc = 0;
        ENTRY;

        lur = (struct llog_unlink_rec *)rec;
        OBDO_ALLOC(oa);
        if (oa == NULL)
                RETURN(-ENOMEM);
        oa->o_valid |= OBD_MD_FLCOOKIE;  <<<<<<<<<<<<<<<<<<<<<<<<<
        oa->o_id = lur->lur_oid;
        oa->o_seq = lur->lur_oseq;
        oa->o_valid = OBD_MD_FLID | OBD_MD_FLGROUP;  <<<<<<<<<<<<<<<<<< bad !!!
        oa->o_lcookie = *cookie;
        /* objid gap may require to destroy several objects in row */
        count = lur->lur_count + 1;

        /* This check is only valid before FID-on-OST and it should
         * be removed after FID-on-OST is implemented */
        if (oa->o_seq > FID_SEQ_OST_MAX) {
                CERROR("%s: invalid group number "LPU64" > MAX_CMD_GROUP %u\n",
                        exp->exp_obd->obd_name, oa->o_seq, FID_SEQ_OST_MAX);
                RETURN(-EINVAL);
        }

        while (count > 0) {
                rc = filter_destroy(exp, oa, NULL, NULL, NULL, NULL);
                if (rc == 0)
                        CDEBUG(D_RPCTRACE, "object "LPU64" is destroyed\n",
                               oa->o_id);
                else if (rc != -ENOENT)
                        CEMERG("error destroying object "LPU64": %d\n",
                               oa->o_id, rc);
                else
                        rc = 0;
                count--;
                oa->o_id++;
        }
        OBDO_FREE(oa);

        RETURN(rc);
}

/* Callback for processing the setattr log record received from MDS by
 * llog_client_api. */
static int filter_recov_log_setattr_cb(struct llog_ctxt *ctxt,
                                       struct llog_rec_hdr *rec,
                                       struct llog_cookie *cookie)
{
        struct obd_device *obd = ctxt->loc_obd;
        struct obd_export *exp = obd->obd_self_export;
        struct obd_info oinfo = { { { 0 } } };
        obd_id oid;
        int rc = 0;
        ENTRY;

        OBDO_ALLOC(oinfo.oi_oa);
        if (oinfo.oi_oa == NULL)
                RETURN(-ENOMEM);

        if (rec->lrh_type == MDS_SETATTR_REC) {
                struct llog_setattr_rec *lsr = (struct llog_setattr_rec *)rec;

                oinfo.oi_oa->o_id = lsr->lsr_oid;
                oinfo.oi_oa->o_seq = lsr->lsr_oseq;
                oinfo.oi_oa->o_uid = lsr->lsr_uid;
                oinfo.oi_oa->o_gid = lsr->lsr_gid;
        } else {
                struct llog_setattr64_rec *lsr = (struct llog_setattr64_rec *)rec;

                oinfo.oi_oa->o_id = lsr->lsr_oid;
                oinfo.oi_oa->o_seq = lsr->lsr_oseq;
                oinfo.oi_oa->o_uid = lsr->lsr_uid;
                oinfo.oi_oa->o_gid = lsr->lsr_gid;
        }

        oinfo.oi_oa->o_valid |= (OBD_MD_FLID | OBD_MD_FLUID | OBD_MD_FLGID |
                                 OBD_MD_FLCOOKIE); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
        oinfo.oi_oa->o_valid = OBD_MD_FLID | OBD_MD_FLGROUP; <<<<<<<<<<<<<<<<<<<<< bad !!
        oinfo.oi_oa->o_lcookie = *cookie;
        oid = oinfo.oi_oa->o_id;

        rc = filter_setattr(exp, &oinfo, NULL);
        OBDO_FREE(oinfo.oi_oa);

        if (rc == -ENOENT) {
                CDEBUG(D_RPCTRACE, "object already removed, send cookie\n");
                llog_cancel(ctxt, NULL, 1, cookie, 0);
                RETURN(0);
        }

        if (rc == 0)
                CDEBUG(D_RPCTRACE, "object "LPU64" is chown/chgrp\n", oid);

        RETURN(rc);
}

Will push a b2_1 patch to fix this now, and see how it works.

Comment by Gerrit Updater [ 21/Oct/15 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/16906
Subject: LU-6612 obdfilter: do not overwrite OBD_MD_FLCOOKIE flag
Project: fs/lustre-release
Branch: b2_1
Current Patch Set: 1
Commit: 1e04568d15e0c623faa8256d0583552a79331a9f

Comment by Bruno Faccini (Inactive) [ 02/Nov/15 ]

Debug patch at http://review.whamcloud.com/16373 has been abandonned.

Comment by Manish Patel (Inactive) [ 04/Nov/15 ]

Hi Bruno,

I see that this patch http://review.whamcloud.com/16373 is abandoned and there is one more patch which is in progress http://review.whamcloud.com/#/c/16906/ so do you want us to apply any of those patches and need something from our end.

Also were you able to reproduce this issues in your test lab and if so then did your Fix patch worked to fix those issues?

Thank You,
Manish

Comment by Bruno Faccini (Inactive) [ 05/Nov/15 ]

Hello Manish,
#16373 debug patch has been abandoned because bug has been found and real fix has been pushed as #16906.
Yes, I have been able to reproduce issue/leak in lab, and no I have presently missed time to verify fix. But even if the bug was quite difficult to identify, the fix is definitely obvious.
So it would be nice if you could give a try to patch #16906.

Comment by Oz Rentas [ 05/Sep/17 ]

This is quite old and no longer an active issue. Please close.

Comment by Peter Jones [ 05/Sep/17 ]

ok - thanks Oz

Comment by Gerrit Updater [ 26/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/15654/
Subject: LU-6612 utils: strengthen llog_reader vs wrong format/header
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 45291b8c06eebf33d3654db3a7d3cfc5836004a6

Comment by Gerrit Updater [ 23/Aug/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48309
Subject: LU-6612 utils: strengthen llog_reader vs wrong format/header
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 7e3f7d1792eba8f3001a618de48397f05046e5f6

Comment by Gerrit Updater [ 17/Oct/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48900
Subject: LU-6612 utils: strengthen llog_reader vs wrong format/header
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 6e98a6e0e0c6bce20c5f3cb8fae1b0ffd9532efb

Comment by Gerrit Updater [ 02/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48900/
Subject: LU-6612 utils: strengthen llog_reader vs wrong format/header
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: badba63a54e905129dbdf28e31026580453ea337

Generated at Sat Feb 10 02:01:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.