[LU-6612] (llog_obd.c:346:cat_cancel_cb()) cancel log + llog_obd.c:315:cat_cancel_cb()) processing log + llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile Created: 18/May/15 Updated: 26/Sep/23 Resolved: 05/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.6 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Manish Patel (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Server 2.1.6 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Hi We seeing that the MDT occupied lot of disk space and most of the space was occupied by OBJECTS/* files of the MDT. We took the backup of MDT and restored the Same MDT in the test lab and deleted all the files residing in the OBJECTS/* and removed the CATALOG file and remounted -t lustre and collected the logs. Below is a snippet showing the pattern of logs generated for each file removed. 00000040:00020000:0.0:1431530471.926250:0:17345:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x2793959b: -116 00000040:00080000:0.0:1431530471.926254:0:17345:0:(llog_obd.c:346:cat_cancel_cb()) cancel log 0x2793959b:f6d1b675 at index 58627 of catalog 0x13600004 00000040:00080000:0.0:1431530471.926255:0:17345:0:(llog_obd.c:315:cat_cancel_cb()) processing log 0x2792f036:f6d21832 at index 58628 of catalog 0x13600004 00000040:00020000:0.0:1431530471.926259:0:17345:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x2792f036:0xf6d21832: rc -116 00000040:00020000:0.0:1431530471.926261:0:17345:0:(llog_cat.c:174:llog_cat_id2handle()) error opening log id 0x2792f036:f6d21832: rc -116 00000040:00020000:0.0:1431530471.926262:0:17345:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x2792f036: -116 00000040:00080000:0.0:1431530471.926265:0:17345:0:(llog_obd.c:346:cat_cancel_cb()) cancel log 0x2792f036:f6d21832 at index 58628 of catalog 0x13600004 00000040:00080000:0.0:1431530471.926267:0:17345:0:(llog_obd.c:315:cat_cancel_cb()) processing log 0x2793959d:f6d27b18 at index 58629 of catalog 0x13600004 00000040:00020000:0.0:1431530471.926270:0:17345:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x2793959d:0xf6d27b18: rc -116 ... This seems the Lustre bug related to the Thank You, |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 19/May/15 ] |
|
Since format of the following log confirms that you are running with patch for 00000040:00020000:0.0:1431530471.926262:0:17345:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x2792f036: -116 I think you are safe. |
| Comment by Manish Patel (Inactive) [ 19/May/15 ] |
|
Hi Bruno, Yes I still see those messages. On the other hand do we still have to worry about the file system data or this error messages has nothing to do with the consistency of the data. Can you confirm about this. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 19/May/15 ] |
|
Still with the same log names/ids and index numbers ? Humm, in fact after re-reading your submission text, I wonder if your OSTs were stopped at the time you have removed the files from the OBJECTS directory and the CATALOG file on your MDT ?? Because if not, msgs may come from old/invalid LLOGs reference from OSTs. If this has been the case, you may find corresponding msgs on OSSs, can you check ? Last, are you using ChangeLogs ? |
| Comment by Andreas Dilger [ 19/May/15 ] |
|
Manish, the OBJECTS files are still in use in Lustre 2.1, so they can't just be deleted - they are used to track files being deleted on the MDT but not yet deleted on the OSTs. It is only with Lustre 2.5 and later that the OBJECTS files are obsolete. I'd agree with Bruno - it seems that these OBJECTS files are not being cleaned up because either the ChangeLog has been enabled but there is no consumer, or there is a missing or disabled OST that has objects deleted (or ownership changed) and the MDS is logging these changes until the OST is returned. Bruno, it is worthwhile to check whether disabling all ChangeLog consumers will delete the ChangeLog records, or if they are left behind? |
| Comment by Manish Patel (Inactive) [ 19/May/15 ] |
|
Hi Andreas, Changelog was enabled for a period of time until the changelog filled up which resulted in client mount issues with "Lustre clients receive -14 "Bad address" from MDS [REF:1638415019758]" and then changelog user was unregistered to fixt that . The current state is as follows: [root@dc2mds01 ~]# cat /proc/fs/lustre/mdd/dc2-MDT0000/changelog_users [root@dc2mds01 ~]# cat /proc/fs/lustre/mdd/dc2-MDT0000/changelog_mask Thank You, |
| Comment by Bruno Faccini (Inactive) [ 21/May/15 ] |
|
Manish, |
| Comment by Manish Patel (Inactive) [ 22/May/15 ] |
|
Hi Bruno, The test which customer performed was on an isolated test server and network. I did not use any OSTs in my tests. Here is another question from customer. "I wasn't aware that interacting with the MDT while mounted as ldiskfs could trigger Lustre operations with the OSTs. I suppose I need to do a bit more reading." Currently customer is planning for upcoming maintenance to resize the grown OI.16 file to free up some space. And if this resize process of oi.16 file created issues then can we simply restore the original oi.16 file? Or must we restore the entire dd based MDT backup? In either restoration case, how would we track/clean up any orphaned objects on the OSTs? Could we record enough information about every file we create during testing that we could use to surgically clean up? If so, how? Or, must we perform a full file system lfsck? Would we run into problems if we reverted to our backup and left a few hundred orphaned OST objects? 1. So are we OK to delete those files from OBJECTS directory with Lustre v2.1.6. to clean extra occupied space by files residing in the OBJECTS directory on MDT or we upgrade the Lustre with v2.5.x or higher version and then perform the deletion of those files residing in the OBJECTS directory. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 27/May/15 ] |
|
Manish, And now, you speak about removing the OI.16.* files ?? But can you really explain what is the purpose of this test ? To prepare a 2.5 upgrade with a manual pre-cleaning of MDT, and a possible downgrade due to problems ? If this is the case, just do the upgrade and then run lfsck, it will do the cleanup for you and better than all of these manual and risky actions, you can trust me! To answer some of your numerous questions: if you remove all OI.16.* files they will be recreated upon MDT start/mount, removing of OBJECTS/* and CATALOG file should work even if there may be a few possible side effects/msgs (like orphaned objects on OSTs, ...), if you want to record all infos to be able to manually clean orphans on OSTs upon an MDT restore best choice is RobinHood, "offline" lfsck version available in 2.1 can also do it for you, "online" lfsck version available in 2.5 will do it much better and easier! |
| Comment by Manish Patel (Inactive) [ 01/Jun/15 ] |
|
Hi Bruno, Yes you understood correctly and we were just trying to see if there is ant risk involved in the upcoming maintenance process, so we decided to postpone the removing the files from the OBJECTS directory and will work on that part with Lustre 2.5.x releases. Yes we hit with the bug DDN-77 and to fix that issue we have to go through this process and right now customer is still testing Lustre v2.5 and want's to keep on hold for now to get on v2.5 so till we have to go through this process of fixing the OI.16 file under Lustre v2.1.6. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 16/Jun/15 ] |
|
Hello Manish, sorry to be late on this. |
| Comment by Manish Patel (Inactive) [ 17/Jun/15 ] |
|
Hi Bruno, Well we have already did e.g. "mv OBJECTS OBJECTS.DDN-191; mv CATALOGS CATALOGS.DDN-191;" to that directory when we encountered some issues with hitting Directory index limit with DDN-191 ticket. Currently Customer is still on Lustre v2.1.6. Since we already renamed the original OBJECTS directory that contains 487G of llogs. Is it safe to delete the old OBJECTS directory to reclaim space during our next maintenance window? Why logs continue to accumulate in the OBJECTS directory and is there something we can do to stop it. Thank You, |
| Comment by Manish Patel (Inactive) [ 23/Jun/15 ] |
|
Hi Bruno, Andreas, Any updates/comments on my previous questions? Other maintenance schedule is heading closer and we have to plan accordingly for that maintenance schedule, so please suggest on this issue. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 26/Jun/15 ] |
|
Hello Manish, sorry to be late on this. And yes, as far as you have indicated that you are already running fine with moved OBJECT/CATALOGS directory/file, it should be fine to remove the old/renamed OBJECTS directory, but I would suggest you do this during dedicated time and all Clients/Servers unmounted. |
| Comment by Manish Patel (Inactive) [ 01/Jul/15 ] |
|
Hi Bruno, Customer tried to run that command on their test cluster with backup mdt data and got error as below: dc-oss01# llog_reader CATALOGS Do you know any alternative workaround to get those data output and their test cluster has 48G of RAM and production cluster has 128G. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 06/Jul/15 ] |
|
Hello Manish, /* the llog header not countable here.*/
recs_num = le32_to_cpu((*llog)->llh_count)-1;
recs_buf = malloc(recs_num * sizeof(struct llog_rec_hdr *));
can easily return -ENOMEM if (*llog)->llh_count value was mistakenly be read as a 0. Can you attach the "binary copy of CATALOGS file" I already/also have asked you ? And also the other infos I had requested too ?? Also have you been able to remove the old/moved OBJECTS directory content/files ?? |
| Comment by Manish Patel (Inactive) [ 06/Jul/15 ] |
|
Hi Bruno, Can you be more clear about the "binary copy of CATALOGS file". Is that you we just need to mount the MDT with ldiskfs and use normal cp command to copy that CATALOGS file and send it to you or there is some different process involved. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 07/Jul/15 ] |
|
Correct, just a raw copy of CATALOGS file. |
| Comment by Manish Patel (Inactive) [ 07/Jul/15 ] |
|
Here is the CATALOGS file from MDT. |
| Comment by Manish Patel (Inactive) [ 07/Jul/15 ] |
|
Hi Bruno, Attached is a copy of the current CATALOGS file. https://jira.hpdd.intel.com/secure/attachment/18394/CATALOGS.20150707 The llog_reader command failed with the following error: Memory Alloc for recs_buf error. So please advice how to get that output and what needs to be done to get rid of that "recs_buf error" We removed the OLD CATALOGS and OBJECTS directories, which we renamed about a month ago. Today customer site has scheduled for maintenance so it would be nice to have next step befre we run out of this maintenance scheulde. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 07/Jul/15 ] |
|
Thanks for the CATALOGS binary file, but what about the : Concerning the "recs_buf error" from "llog_reader CATALOGS" cmd, they should come from the fact "llog_reader" don't know know howto interpret CATALOGS formet, but I can manage to get interesting stuff from CATALOGS binary file. |
| Comment by Manish Patel (Inactive) [ 10/Jul/15 ] |
|
Bruno requested rest the data. |
| Comment by Manish Patel (Inactive) [ 10/Jul/15 ] |
|
Hi Bruno, Here are the other requested files. https://jira.hpdd.intel.com/secure/attachment/18414/ddn_sr44330_lctl_dl.txt Thank You, |
| Comment by Bruno Faccini (Inactive) [ 20/Jul/15 ] |
|
Hello Manish, _ the llog_reader.txt file, also again highlights the llog_reader error when used on the CATALOG file that is not of expected LLOG format. I will push a patch soon to have llog_reader gracefully handle such case. _ the ddn_sr44330_ls_objects.txt file shows that there are 336 "100uvw:68b06xyz" small files (2x per OST and half of them as the only referenced in CATALOG file), in OBJECTS/ dir. All are dated "Jun 9 10:56". These should be the normal OSP sync LLOGs catalogs. _ the ddn_sr44330_ls_objects.txt file also shows 190 more recent and others files of reasonable size (all are < 3MBs). Are these files te one you suspect to leak and to have consumed a huge space in your MDT before you have removed them ? _ last, do you still see the "(llog_obd.c:320:cat_cancel_cb()) |
| Comment by Manish Patel (Inactive) [ 20/Jul/15 ] |
|
Hi Bruno, We have already cured the hug oi.16 file issues and it's shrinked to normal size. Regarding rest of the logs files related questiosn I will have to check with customer to get their input and I will update this threads. Thank You, |
| Comment by Gerrit Updater [ 20/Jul/15 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/15654 |
| Comment by Manish Patel (Inactive) [ 23/Jul/15 ] |
|
Hi Bruno, It seems that any file over a few hours old in the OBJECTS directory is leaked. It would be nice if you can take a look at the ls -lah output from the MDT OBJECTS directory in the attached file "find_exec_ls_alh_manish.txt file" to this case for a more complete history of the number of files, and their timestamp and size. The file lists most of the nearly 74k files that were removed during our last maintenance window. https://jira.hpdd.intel.com/secure/attachment/18474/find_exec_ls_alh_manish.txt The most recent llog_obd.c:320:cat_cancel_cb() error logged was on 7/9 for a different log: Jul 8 00:06:11 dc2mds01 kernel: : LustreError: 8949:0:(llog_obd.c:320:cat_cancel_cb()) Cannot find handle for log 0x27924bbc: -116 Jul 8 00:06:11 dc2mds01 kernel: : LustreError: 8949:0:(llog_obd.c:320:cat_cancel_cb()) Skipped 26419 previous similar messages Thank You, |
| Comment by Manish Patel (Inactive) [ 30/Jul/15 ] |
|
Hi Bruno, Customer is scheduled to have maintenance window on upcoming next Tuesday, so can you take a look at that file which I listed in my previous post and see if you need anything else from the customer to be collected during the maintenance downtime. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 31/Jul/15 ] |
|
Hello Manish, |
| Comment by Manish Patel (Inactive) [ 31/Jul/15 ] |
|
Hi Bruno, Customer took the MDT backup and loaded the MDT data on their test cluster and mounted with ldiskfs and those data are captured on May-12-2015. The command used to get the file data "find_exec_ls_alh_manish.txt " is as below: find OBJECTS -name "[0-9][0-9]*:[0-9a-f][0-9a-f]*" -exec ls -lah {} \;
Let me know if you need anything else to troubleshoot further. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 03/Aug/15 ] |
|
Humm I was also asking for some feedback about the differences I have found in previous ddn_sr44330_ls_objects.txt file/infos ... |
| Comment by Manish Patel (Inactive) [ 04/Aug/15 ] |
|
Hi Bruno, Customer collected a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory. Here is new data which you requested. http://ddntsr.com/ftp/2015-08-04-SR44330_20150804.tar Let us know if you need anything else. Thank You, |
| Comment by Manish Patel (Inactive) [ 20/Aug/15 ] |
|
Hi Bruno, Any updates on the data which I uploaded in the previous post. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 01/Sep/15 ] |
|
Manish, |
| Comment by Manish Patel (Inactive) [ 01/Sep/15 ] |
|
Hi Bruno, I have uploaded that logs to Intel FTP site with directory name to this Jira Case Thank You, |
| Comment by Bruno Faccini (Inactive) [ 03/Sep/15 ] |
|
Adding recent email/answer sent to concerned people inclusing Manish : Since Manish is in copy, I am using this email to ask him more infos : Thanks again for your help, |
| Comment by Bruno Faccini (Inactive) [ 03/Sep/15 ] |
|
I have done more work on latest provided datas, and here are the results : So |
| Comment by Manish Patel (Inactive) [ 03/Sep/15 ] |
|
Hi Bruno, -> Yes this site is having 168 OST's configured. I will have to get the rest of the data from the customer and post it once I get those details. So how do I check the "osp_sync_thread" state on MDS", is there any command which I can use it. Thank You, |
| Comment by Peter Jones [ 03/Sep/15 ] |
|
Bruno
Peter |
| Comment by Bruno Faccini (Inactive) [ 03/Sep/15 ] |
|
Since it is established that our 2.1 problem could not be related/linked to |
| Comment by Manish Patel (Inactive) [ 03/Sep/15 ] |
|
Hi Bruno, I have uploaded the logs to intel FTP under File system has be restarted so many times and it's gone be hard to get all those data but here is recent activity. 06-03-2015 : File System is taken down for oi.16 shrink procedure and restarted during scheduled maintenance window. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 04/Sep/15 ] |
|
Thanks to provide the logs. messages:Sep 2 04:05:50 dc2oss12 kernel: : LustreError: 14888:0:(ldlm_resource.c:1090:ldlm_resource_get()) lvbo_init failed for resource 110829994: rc -2 messages:Sep 2 04:05:50 dc2oss12 kernel: : LustreError: 7212:0:(filter.c:3136:__filter_oa2dentry()) dc2-OST0075: filter_sync on non-existent object: 110829994:0 messages:Sep 2 04:05:50 dc2oss12 kernel: : LustreError: 7212:0:(ost_handler.c:1648:ost_blocking_ast()) Error -2 syncing data on lock cancel I am presently analyzing the related code path to see if it can be the cause of the leaking MDS_UNLINK_REC records in the OSP-SYNC LLOGs. |
| Comment by Bruno Faccini (Inactive) [ 07/Sep/15 ] |
|
There also the following error msgs from filter_destroy() which could be much more related : $ grep filter_destroy es_lustre_showall_2015-09-03_151236/*oss*/logs/messages-* es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:23:13 dc2oss01 kernel: : [<ffffffffa0ca5a24>] filter_destroy+0xf24/0x13d0 [obdfilter] es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:24:56 dc2oss01 kernel: : LustreError: 15252:0:(filter.c:4267:filter_destroy()) error on commit, err = -30 es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 03:24:56 dc2oss01 kernel: : LustreError: 15252:0:(filter.c:4267:filter_destroy()) Skipped 1 previous similar message es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:45 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy()) dc2-OST0006: can not find olg of group 0 es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:45 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy()) dc2-OST0006: can not find olg of group 0 es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:57 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy()) dc2-OST0006: can not find olg of group 0 es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:57 dc2oss01 kernel: : LustreError: 16201:0:(filter.c:4151:filter_destroy()) Skipped 7 previous similar messages es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:59 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy()) dc2-OST0006: can not find olg of group 0 es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:19:59 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy()) Skipped 9 previous similar messages es_lustre_showall_2015-09-03_151236/dc2oss01/logs/messages-20150816:Aug 14 08:20:01 dc2oss01 kernel: : LustreError: 16189:0:(filter.c:4151:filter_destroy()) dc2-OST0006: can not find olg of group 0 es_lustre_showall_2015-09-03_151236/dc2oss02/logs/messages-20150816:Aug 15 10:56:03 dc2oss02 kernel: : [<ffffffffa0c2f218>] ? filter_destroy+0x718/0x13d0 [obdfilter] es_lustre_showall_2015-09-03_151236/dc2oss08/logs/messages-20150823:Aug 18 06:29:24 dc2oss08 kernel: : [<ffffffffa0cb2218>] filter_destroy+0x718/0x13d0 [obdfilter] es_lustre_showall_2015-09-03_151236/dc2oss10/logs/messages-20150809:Aug 5 14:31:51 dc2oss10 kernel: : LustreError: 15496:0:(filter.c:1621:filter_destroy_internal()) destroying objid 94281010 ino 36943382 nlink 1 count 3 and could be the start-point of MDS_UNLINK_REC records and containing LLOG plain file leakage. Since problem seems to occur on any OST, I have decided to concentrate on OST0000, and this OST's Catalog presently only contains 5 references/records of plain LLOGs : Header size : 8192 Time : Tue Jun 9 07:48:06 2015 Number of records: 5 Target uuid : ----------------------- #01 (064)ogen=68B06AFD name=0:1048793 #102 (064)ogen=8E05AC60 name=0:1049057 #103 (064)ogen=B52DE2C3 name=0:1049144 #104 (064)ogen=EEE3D5B name=0:1049420 #193 (064)ogen=B9CE5F6A name=0:1049449 if we use the 1st one, it contains (like the others) a few valid records : rec #1 type=10612404 len=40 <<<<< mainly MDS_UNLINK_REC records rec #2 type=10612404 len=40 rec #3 type=10612404 len=40 rec #4 type=10612404 len=40 rec #5 type=10612404 len=40 rec #6 type=10612404 len=40 rec #7 type=10612404 len=40 rec #8 type=10612404 len=40 rec #9 type=10612404 len=40 rec #10 type=10612404 len=40 rec #11 type=10612404 len=40 rec #12 type=10612404 len=40 rec #13 type=10612404 len=40 rec #14 type=10612404 len=40 rec #15 type=10612404 len=40 rec #19 type=10612404 len=40 rec #20 type=10612404 len=40 rec #21 type=10612404 len=40 rec #24 type=10612404 len=40 rec #27 type=10612404 len=40 rec #28 type=10612404 len=40 rec #29 type=10612404 len=40 rec #30 type=10612404 len=40 rec #32 type=10612404 len=40 rec #33 type=10612404 len=40 rec #37 type=10612404 len=40 rec #38 type=10612404 len=40 rec #39 type=10612404 len=40 rec #42 type=10612404 len=40 rec #43 type=10612404 len=40 rec #44 type=10612404 len=40 rec #45 type=10612404 len=40 rec #46 type=10612404 len=40 rec #47 type=10612404 len=40 rec #52 type=10612404 len=40 rec #53 type=10612404 len=40 rec #54 type=10612404 len=40 rec #55 type=10612404 len=40 rec #56 type=10612404 len=40 rec #57 type=10612404 len=40 rec #58 type=10692401 len=56 <<<<< but also a MDS_SETATTR64_REC rec #59 type=10612404 len=40 rec #60 type=10612404 len=40 rec #61 type=10612404 len=40 rec #62 type=10612404 len=40 rec #63 type=10612404 len=40 rec #64 type=10612404 len=40 rec #65 type=10612404 len=40 rec #66 type=10612404 len=40 rec #67 type=10612404 len=40 rec #68 type=10612404 len=40 rec #69 type=10612404 len=40 rec #70 type=10612404 len=40 rec #71 type=10612404 len=40 rec #72 type=10612404 len=40 rec #73 type=10612404 len=40 rec #74 type=10612404 len=40 rec #75 type=10612404 len=40 rec #76 type=10612404 len=40 rec #77 type=10612404 len=40 rec #78 type=10612404 len=40 rec #79 type=10612404 len=40 rec #80 type=10612404 len=40 rec #81 type=10612404 len=40 rec #82 type=10612404 len=40 rec #83 type=10612404 len=40 rec #84 type=10612404 len=40 rec #85 type=10612404 len=40 rec #86 type=10612404 len=40 rec #87 type=10612404 len=40 rec #88 type=10612404 len=40 rec #89 type=10612404 len=40 Thus, could you check if OBJID 0x532cb79 (87214969) still exists on OST0000 and also if OBJID 0x532ccc6 (87215302) still exists on OST0000, and the later with 0xb8517 (754967) uid and 0xd2 (210) gid ? debugfs -R "stat O/0/d25/87214969" /dev/... debugfs -R "stat O/0/d6/87215302" /dev/... Would be also very helpful if you could again provide a similar package containing MDS's CATALOGS and OBJECTS/* files, for me to understand problem's evolution/consequences based on the recently provided logs content. |
| Comment by Manish Patel (Inactive) [ 08/Sep/15 ] |
|
Hi Bruno, Can you provide more details on what exactly you need for "Would be also very helpful if you could again provide a similar package containing MDS's CATALOGS and OBJECTS/* files, for me to understand problem's evolution/consequences based on the recently provided logs content." Also do you need any out with 0xb8517 (754967) uid and 0xd2 (210) gid, then let us know the command to gather those data. I will ask the customer to get the debugfs output and update the post. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 09/Sep/15 ] |
|
> Can you provide more details on what exactly you need for > Also do you need any out with 0xb8517 (754967) uid and 0xd2 (210) gid, then let us know the command to gather those data. |
| Comment by Manish Patel (Inactive) [ 09/Sep/15 ] |
|
Hi Bruno, Here is the debugfs output which you requested and rest of other data we can get during next maintenance window happening in October. [root@dc2oss01 ~]# debugfs -R "stat O/0/d25/87214969" /dev/mapper/ost_dc2_0 2>&1 | tee /tmp/sr44330_87214969.log debugfs 1.42.7.wc1 (12-Apr-2013) O/0/d25/87214969: File not found by ext2_lookup [root@dc2oss01 ~]# debugfs -R "stat O/0/d6/87215302" /dev/mapper/ost_dc2_0 2>&1 | tee /tmp/sr44330_87215302.log debugfs 1.42.7.wc1 (12-Apr-2013) Inode: 94402427 Type: regular Mode: 0666 Flags: 0x80000 Generation: 1758127277 Version: 0x00000000:21734617 User: 754967 Group: 210 Size: 2914894 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 5696 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x55770077:00000000 -- Tue Jun 9 11:04:23 2015 atime: 0x55770086:00000000 -- Tue Jun 9 11:04:38 2015 mtime: 0x55770077:00000000 -- Tue Jun 9 11:04:23 2015 crtime: 0x5577003b:611b92d8 -- Tue Jun 9 11:03:23 2015 Size of extra inode fields: 28 Extended attributes stored in inode body: fid = "fe 0d 00 5b 76 f2 10 02 00 00 00 00 00 00 00 00 c6 cc 32 05 00 00 00 00 00 00 00 00 00 00 00 00 " (32) fid: objid=87215302 seq=0 parent=[0x210f2765b000dfe:0x0:0x0] stripe=0 EXTENTS: (0-255):2009370624-2009370879, (256-711):2009379072-2009379527 Thank You, |
| Comment by Bruno Faccini (Inactive) [ 10/Sep/15 ] |
|
Thanks Manish, |
| Comment by Gerrit Updater [ 11/Sep/15 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/16373 |
| Comment by Manish Patel (Inactive) [ 11/Sep/15 ] |
|
Hi Bruno, So is that patch is with the fix or it's just the debug patch. Thank You, |
| Comment by Peter Jones [ 12/Sep/15 ] |
|
Manish Based on the comment above and the commit message it seems that this is a debug patch Peter |
| Comment by Manish Patel (Inactive) [ 14/Sep/15 ] |
|
Hi Bruno, Were you able to create a reproducer and found anything? Thank You, |
| Comment by Bruno Faccini (Inactive) [ 15/Sep/15 ] |
|
Hello Manish, |
| Comment by Bruno Faccini (Inactive) [ 18/Sep/15 ] |
|
Unfortunately I am presently unable to reproduce even if trying to create scenarios likely to generate the previously suspected msgs ... Also about the 2.1.6 version you have claimed for this problem, the logs yopu have provided state "Lustre: Build Version: EXAScaler-ddn1.0--PRISTINE-2.6.32-358.11.1.el6_lustre.es359.devel.x86_64", can you give me more details about the exact content (Tag it is based on, additional patches, ...) ? And again, as I have already requested, in my 2 previous updates on 7+9/Sep/15, can you provide « Exactly the same stuff/infos than what you have provided and commented in this ticket on 04/Aug/15, but from now » ??? |
| Comment by Manish Patel (Inactive) [ 18/Sep/15 ] |
|
Hi Bruno, Yes if you can provide that new debug patch in few days then we can build new RPMS and test it and install it at next maintenance window, which is most likely to be happening on October/06 and at same time we can collect those previous set of data which was provided on 04/Aug/15. Here is more details about this Lustre Build Patches applied to that Build. https://jira.hpdd.intel.com/secure/attachment/18930/IU_Lustre_Build_Patches_06_08_2015.txt Thank You, |
| Comment by Manish Patel (Inactive) [ 05/Oct/15 ] |
|
Hi Bruno, Let us know if you need anything else other than below listed data, a listing of the OBJECTS directory, a copy of the CATALOGS file, and a tarball of OBJECTS directory. Customer is having a downtime tomorrow and if you have that enhanced version of patch ready then let us know. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 06/Oct/15 ] |
|
Hello Manish, If you could also ensure to run the whole restart with the default (ioctl neterror warning error emerg ha config console lfsck) debug mask set, on MDS, and then take a full trace at the end of restart/recovery, that would also be very helpful. All MDS/OSSs syslogs/messages taken sometimes after restart will also be required. Last, you can add/use my debug patch available at http://review.whamcloud.com/16373 for Lustre version to be used for the restart, it will print more debug msgs/infos on some identified paths in code that may cause the unlink/setattr LLOG records leakage. |
| Comment by Bruno Faccini (Inactive) [ 06/Oct/15 ] |
|
If customer agrees to set the MDS debug mask as I have requested (and also to grow the size of the log buffer), I need to better detail the procedure to apply it. Also, you can reset the mask and buffer log size, to their original values, after filesystem has been fully started. |
| Comment by Manish Patel (Inactive) [ 06/Oct/15 ] |
|
Hi Bruno, We need more clarification about this part. "Yes, that will be very helpful, but even more if you can get the same infos after filesystem has been fully stopped, and again after filesystem has been fully re-started (and any recovery is completed)." So do you mean that after full stopped we collect the data using ldiskfs and then get back file system online and once the recovery is completed we take down the file system again and mount it with ldiskfs and collect those same set of data? Or you have other way around to collect those data while file system is up and online using 'debugfs' or other method. So can you please clarify about this part and provide detail steps to get those data. Also we won't be able to deploy the patch today since it has to go through their test lab testing and probably it will be deployed at next maintenance window. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 06/Oct/15 ] |
|
You will not need to stop the filesystem again, you can safely mount the MDT as ldiskfs, but from the same node/MDS only, and this in parallel of normal operations, until you don't modify anything on it. You can also use debugfs if you want, but dump'ing all the OBJECTS/ dir files could be a long process. Ok, for the patch I will let you know if I modify/enhance it in between. |
| Comment by Manish Patel (Inactive) [ 06/Oct/15 ] |
|
Hi Bruno, Here are the logs which you requested. http://ddntsr.com/ftp/2015-10-06-SR44330_20151006.tar It contains the following files: SR44330_ls-li_MDT_OBJECTS_20151006.txt The files with a suffix of *b.txt and *b.tar.gz were collected after customer brought the file system back online. Here are the syslogs if needed. Let us know if you need anything else. Thank You, |
| Comment by Bruno Faccini (Inactive) [ 08/Oct/15 ] |
|
Thanks Manish, I am presently reviewing all of these new infos, will let you know asap about any progress. |
| Comment by Bruno Faccini (Inactive) [ 21/Oct/15 ] |
|
Sorry, it took me some time to parse all these new datas ... 00002000:00000001:27.0:1445357157.777095:0:5009:0:(filter_log.c:266:filter_recov_log_mds_ost_cb()) Process entered 00002000:00000001:27.0:1445357157.777097:0:5009:0:(filter_log.c:166:filter_recov_log_unlink_cb()) Process entered 00002000:00000010:27.0:1445357157.777099:0:5009:0:(filter_log.c:169:filter_recov_log_unlink_cb()) slab-alloced '((oa))': 208 at ffff8808362f4cb0. 00002000:00000001:27.0:1445357157.777103:0:5009:0:(filter.c:4119:filter_destroy()) Process entered 00002000:00000001:27.0:1445357157.777105:0:5009:0:(filter_capa.c:127:filter_auth_capa()) Process entered 00002000:00000001:27.0:1445357157.777107:0:5009:0:(filter_capa.c:141:filter_auth_capa()) Process leaving (rc=0 : 0 : 0) 00002000:00000002:27.0:1445357157.777110:0:5009:0:(filter.c:4133:filter_destroy()) lustre-OST0000: filter_destroy(group=0,oid=2) 00002000:00000001:27.0:1445357157.777113:0:5009:0:(filter.c:1474:filter_fid2dentry()) Process entered 00002000:00000002:27.0:1445357157.777116:0:5009:0:(filter.c:1499:filter_fid2dentry()) looking up object O/d2/2 00002000:00000002:27.0:1445357157.777177:0:5009:0:(filter.c:1518:filter_fid2dentry()) got child objid 2: ffff880834151740, count = 1 00002000:00000001:27.0:1445357157.777180:0:5009:0:(filter.c:1522:filter_fid2dentry()) Process leaving (rc=18446612167547754304 : -131906161797312 : ffff880834151740) 00002000:00000002:27.0:1445357157.777183:0:5009:0:(filter.c:4142:filter_destroy()) destroying non-existent object 2:0 00002000:00000001:27.0:1445357157.777185:0:5009:0:(filter.c:4161:filter_destroy()) Process leaving via cleanup (rc=18446744073709551614 : -2 : 0xfffffffffffffffe) 00002000:00000002:27.0:1445357157.777187:0:5009:0:(filter.c:221:f_dput()) putting 2: ffff880834151740, count = 0 00002000:00000001:27.0:1445357157.777191:0:5009:0:(lustre_quota.h:653:lquota_adjust()) Process entered 00040000:00000001:27.0:1445357157.777193:0:5009:0:(quota_master.c:577:filter_quota_adjust()) Process entered 00040000:00000001:27.0:1445357157.777195:0:5009:0:(quota_master.c:580:filter_quota_adjust()) Process leaving (rc=0 : 0 : 0) 00002000:00000001:27.0:1445357157.777197:0:5009:0:(lustre_quota.h:657:lquota_adjust()) Process leaving (rc=0 : 0 : 0) 00002000:00000010:27.0:1445357157.777198:0:5009:0:(filter_log.c:201:filter_recov_log_unlink_cb()) slab-freed '((oa))': 208 at ffff8808362f4cb0. 00002000:00000001:27.0:1445357157.777200:0:5009:0:(filter_log.c:203:filter_recov_log_unlink_cb()) Process leaving (rc=0 : 0 : 0) 00002000:00000001:27.0:1445357157.777202:0:5009:0:(filter_log.c:313:filter_recov_log_mds_ost_cb()) Process leaving (rc=0 : 0 : 0) and according to the code in b2_1, this should occur by calling either llog_cancel() or filter_cancel_cookies_cb() directly or as a callback, but only if OBD_MD_FLCOOKIE is not set in obdo valid flags, and this stands for both the unlink and setattr records case :
int filter_setattr_internal(struct obd_export *exp, struct dentry *dentry,
struct obdo *oa, struct obd_trans_info *oti)
{
unsigned int orig_ids[MAXQUOTAS] = {0, 0};
struct llog_cookie *fcc = NULL;
struct filter_obd *filter;
int rc, err, sync = 0;
....................
if (oa->o_valid & OBD_MD_FLCOOKIE) {
OBD_ALLOC(fcc, sizeof(*fcc));
if (fcc != NULL)
*fcc = oa->o_lcookie;
}
...................
if (oa->o_valid & OBD_MD_FLFLAGS) {
rc = fsfilt_iocontrol(exp->exp_obd, dentry,
FSFILT_IOC_SETFLAGS, (long)&oa->o_flags);
} else {
rc = fsfilt_setattr(exp->exp_obd, dentry, handle, &iattr, 1);
if (fcc != NULL)
/* set cancel cookie callback function */
sync = fsfilt_add_journal_cb(exp->exp_obd, 0, handle,
filter_cancel_cookies_cb, <<<<<<<<<<<<<<<<<<<<<<<
fcc);
}
...................
if (sync) {
filter_cancel_cookies_cb(exp->exp_obd, 0, fcc, rc);
fcc = NULL;
}
..........................
int filter_destroy(struct obd_export *exp, struct obdo *oa,
struct lov_stripe_md *md, struct obd_trans_info *oti,
struct obd_export *md_exp, void *capa)
{
unsigned int qcids[MAXQUOTAS] = {0, 0};
struct obd_device *obd;
struct filter_obd *filter;
struct dentry *dchild = NULL, *dparent = NULL;
struct lustre_handle lockh = { 0 };
struct lvfs_run_ctxt saved;
void *handle = NULL;
struct llog_cookie *fcc = NULL;
int rc, rc2, cleanup_phase = 0, sync = 0;
struct iattr iattr;
unsigned long now;
ENTRY;
rc = filter_auth_capa(exp, NULL, oa->o_seq,
(struct lustre_capa *)capa, CAPA_OPC_OSS_DESTROY);
if (rc)
RETURN(rc);
obd = exp->exp_obd;
filter = &obd->u.filter;
push_ctxt(&saved, &obd->obd_lvfs_ctxt, NULL);
cleanup_phase = 1;
CDEBUG(D_INODE, "%s: filter_destroy(group="LPU64",oid="
LPU64")\n", obd->obd_name, oa->o_seq, oa->o_id);
dchild = filter_fid2dentry(obd, NULL, oa->o_seq, oa->o_id);
if (IS_ERR(dchild))
GOTO(cleanup, rc = PTR_ERR(dchild));
cleanup_phase = 2;
if (dchild->d_inode == NULL) {
CDEBUG(D_INODE, "destroying non-existent object "POSTID"\n",
oa->o_id, oa->o_seq);
/* If object already gone, cancel cookie right now */
if (oa->o_valid & OBD_MD_FLCOOKIE) {
struct llog_ctxt *ctxt;
struct obd_llog_group *olg;
olg = filter_find_olg(obd, oa->o_seq);
if (!olg) {
CERROR("%s: can not find olg of group %d "
"for objid "LPU64"\n", obd->obd_name,
(int)oa->o_seq, oa->o_id);
GOTO(cleanup, rc = PTR_ERR(olg));
}
fcc = &oa->o_lcookie;
ctxt = llog_group_get_ctxt(olg, fcc->lgc_subsys + 1);
llog_cancel(ctxt, NULL, 1, fcc, 0); <<<<<<<<<<<<<<<
llog_ctxt_put(ctxt);
fcc = NULL; /* we didn't allocate fcc, don't free it */
}
GOTO(cleanup, rc = -ENOENT);
}
rc = filter_prepare_destroy(obd, oa->o_id, oa->o_seq, &lockh);
if (rc)
GOTO(cleanup, rc);
/* Our MDC connection is established by the MDS to us */
if (oa->o_valid & OBD_MD_FLCOOKIE) {
OBD_ALLOC(fcc, sizeof(*fcc));
if (fcc != NULL)
*fcc = oa->o_lcookie; <<<<<<<<<<<<<<<<<<<<<<<<<<<<
}
/* we're gonna truncate it first in order to avoid possible deadlock:
* P1 P2
* open trasaction open transaction
* down(i_zombie) down(i_zombie)
* restart transaction
* (see BUG 4180) -bzzz
*
* take i_alloc_sem too to prevent other threads from writing to the
* file while we are truncating it. This can cause lock ordering issue
* between page lock, i_mutex & starting new journal handle.
* (see bug 20321) -johann
*/
now = jiffies;
down_write(&dchild->d_inode->i_alloc_sem);
LOCK_INODE_MUTEX(dchild->d_inode);
fsfilt_check_slow(exp->exp_obd, now, "i_alloc_sem and i_mutex");
/* VBR: version recovery check */
rc = filter_version_get_check(exp, oti, dchild->d_inode);
if (rc) {
UNLOCK_INODE_MUTEX(dchild->d_inode);
up_write(&dchild->d_inode->i_alloc_sem);
GOTO(cleanup, rc);
}
handle = fsfilt_start_log(obd, dchild->d_inode, FSFILT_OP_SETATTR,
NULL, 1);
if (IS_ERR(handle)) {
UNLOCK_INODE_MUTEX(dchild->d_inode);
up_write(&dchild->d_inode->i_alloc_sem);
GOTO(cleanup, rc = PTR_ERR(handle));
}
/* Locking order: i_mutex -> journal_lock -> dqptr_sem. LU-952 */
ll_vfs_dq_init(dchild->d_inode);
iattr.ia_valid = ATTR_SIZE;
iattr.ia_size = 0;
rc = fsfilt_setattr(obd, dchild, handle, &iattr, 1);
rc2 = fsfilt_commit(obd, dchild->d_inode, handle, 0);
UNLOCK_INODE_MUTEX(dchild->d_inode);
up_write(&dchild->d_inode->i_alloc_sem);
if (rc)
GOTO(cleanup, rc);
if (rc2)
GOTO(cleanup, rc = rc2);
/* We don't actually need to lock the parent until we are unlinking
* here, and not while truncating above. That avoids holding the
* parent lock for a long time during truncate, which can block other
* threads from doing anything to objects in that directory. bug 7171 */
dparent = filter_parent_lock(obd, oa->o_seq, oa->o_id);
if (IS_ERR(dparent))
GOTO(cleanup, rc = PTR_ERR(dparent));
cleanup_phase = 3; /* filter_parent_unlock */
LOCK_INODE_MUTEX(dchild->d_inode);
handle = fsfilt_start_log(obd, dparent->d_inode,FSFILT_OP_UNLINK,oti,1);
if (IS_ERR(handle)) {
UNLOCK_INODE_MUTEX(dchild->d_inode);
GOTO(cleanup, rc = PTR_ERR(handle));
}
cleanup_phase = 4; /* fsfilt_commit */
/* Quota release need uid/gid of inode */
obdo_from_inode(oa, dchild->d_inode, NULL, OBD_MD_FLUID|OBD_MD_FLGID);
filter_fmd_drop(exp, oa->o_id, oa->o_seq);
/* this drops dchild->d_inode->i_mutex unconditionally */
rc = filter_destroy_internal(obd, oa->o_id, oa->o_seq, dparent, dchild);
EXIT;
cleanup:
switch(cleanup_phase) {
case 4:
if (fcc != NULL)
sync = fsfilt_add_journal_cb(obd, 0, oti ?
oti->oti_handle : handle,
filter_cancel_cookies_cb, <<<<<<<<<<<<<<<<<<<<<<<<<<<<
fcc);
/* If add_journal_cb failed, then filter_finish_transno
* will commit the handle and we will do a sync
* on commit. then we call callback directly to free
* the fcc.
*/
rc = filter_finish_transno(exp, NULL, oti, rc, sync);
if (sync) {
filter_cancel_cookies_cb(obd, 0, fcc, rc); <<<<<<<<<<<<<<<<<<<<<<<<<<
fcc = NULL;
}
rc2 = fsfilt_commit(obd, dparent->d_inode, handle, 0);
if (rc2) {
CERROR("error on commit, err = %d\n", rc2);
if (!rc)
rc = rc2;
} else {
fcc = NULL;
}
case 3:
filter_parent_unlock(dparent);
case 2:
filter_fini_destroy(obd, &lockh);
f_dput(dchild);
if (fcc != NULL)
OBD_FREE(fcc, sizeof(*fcc));
case 1:
pop_ctxt(&saved, &obd->obd_lvfs_ctxt, NULL);
break;
default:
CERROR("invalid cleanup_phase %d\n", cleanup_phase);
LBUG();
}
/* trigger quota release */
qcids[USRQUOTA] = oa->o_uid;
qcids[GRPQUOTA] = oa->o_gid;
rc2 = lquota_adjust(filter_quota_interface_ref, obd, qcids, NULL, rc,
FSFILT_OP_UNLINK);
if (rc2)
CERROR("filter adjust qunit! (rc:%d)\n", rc2);
return rc;
}
then, having a look to the code of callers : /* Callback for processing the unlink log record received from MDS by
* llog_client_api. */
static int filter_recov_log_unlink_cb(struct llog_ctxt *ctxt,
struct llog_rec_hdr *rec,
struct llog_cookie *cookie)
{
struct obd_export *exp = ctxt->loc_obd->obd_self_export;
struct llog_unlink_rec *lur;
struct obdo *oa;
obd_count count;
int rc = 0;
ENTRY;
lur = (struct llog_unlink_rec *)rec;
OBDO_ALLOC(oa);
if (oa == NULL)
RETURN(-ENOMEM);
oa->o_valid |= OBD_MD_FLCOOKIE; <<<<<<<<<<<<<<<<<<<<<<<<<
oa->o_id = lur->lur_oid;
oa->o_seq = lur->lur_oseq;
oa->o_valid = OBD_MD_FLID | OBD_MD_FLGROUP; <<<<<<<<<<<<<<<<<< bad !!!
oa->o_lcookie = *cookie;
/* objid gap may require to destroy several objects in row */
count = lur->lur_count + 1;
/* This check is only valid before FID-on-OST and it should
* be removed after FID-on-OST is implemented */
if (oa->o_seq > FID_SEQ_OST_MAX) {
CERROR("%s: invalid group number "LPU64" > MAX_CMD_GROUP %u\n",
exp->exp_obd->obd_name, oa->o_seq, FID_SEQ_OST_MAX);
RETURN(-EINVAL);
}
while (count > 0) {
rc = filter_destroy(exp, oa, NULL, NULL, NULL, NULL);
if (rc == 0)
CDEBUG(D_RPCTRACE, "object "LPU64" is destroyed\n",
oa->o_id);
else if (rc != -ENOENT)
CEMERG("error destroying object "LPU64": %d\n",
oa->o_id, rc);
else
rc = 0;
count--;
oa->o_id++;
}
OBDO_FREE(oa);
RETURN(rc);
}
/* Callback for processing the setattr log record received from MDS by
* llog_client_api. */
static int filter_recov_log_setattr_cb(struct llog_ctxt *ctxt,
struct llog_rec_hdr *rec,
struct llog_cookie *cookie)
{
struct obd_device *obd = ctxt->loc_obd;
struct obd_export *exp = obd->obd_self_export;
struct obd_info oinfo = { { { 0 } } };
obd_id oid;
int rc = 0;
ENTRY;
OBDO_ALLOC(oinfo.oi_oa);
if (oinfo.oi_oa == NULL)
RETURN(-ENOMEM);
if (rec->lrh_type == MDS_SETATTR_REC) {
struct llog_setattr_rec *lsr = (struct llog_setattr_rec *)rec;
oinfo.oi_oa->o_id = lsr->lsr_oid;
oinfo.oi_oa->o_seq = lsr->lsr_oseq;
oinfo.oi_oa->o_uid = lsr->lsr_uid;
oinfo.oi_oa->o_gid = lsr->lsr_gid;
} else {
struct llog_setattr64_rec *lsr = (struct llog_setattr64_rec *)rec;
oinfo.oi_oa->o_id = lsr->lsr_oid;
oinfo.oi_oa->o_seq = lsr->lsr_oseq;
oinfo.oi_oa->o_uid = lsr->lsr_uid;
oinfo.oi_oa->o_gid = lsr->lsr_gid;
}
oinfo.oi_oa->o_valid |= (OBD_MD_FLID | OBD_MD_FLUID | OBD_MD_FLGID |
OBD_MD_FLCOOKIE); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
oinfo.oi_oa->o_valid = OBD_MD_FLID | OBD_MD_FLGROUP; <<<<<<<<<<<<<<<<<<<<< bad !!
oinfo.oi_oa->o_lcookie = *cookie;
oid = oinfo.oi_oa->o_id;
rc = filter_setattr(exp, &oinfo, NULL);
OBDO_FREE(oinfo.oi_oa);
if (rc == -ENOENT) {
CDEBUG(D_RPCTRACE, "object already removed, send cookie\n");
llog_cancel(ctxt, NULL, 1, cookie, 0);
RETURN(0);
}
if (rc == 0)
CDEBUG(D_RPCTRACE, "object "LPU64" is chown/chgrp\n", oid);
RETURN(rc);
}
Will push a b2_1 patch to fix this now, and see how it works. |
| Comment by Gerrit Updater [ 21/Oct/15 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/16906 |
| Comment by Bruno Faccini (Inactive) [ 02/Nov/15 ] |
|
Debug patch at http://review.whamcloud.com/16373 has been abandonned. |
| Comment by Manish Patel (Inactive) [ 04/Nov/15 ] |
|
Hi Bruno, I see that this patch http://review.whamcloud.com/16373 is abandoned and there is one more patch which is in progress http://review.whamcloud.com/#/c/16906/ so do you want us to apply any of those patches and need something from our end. Also were you able to reproduce this issues in your test lab and if so then did your Fix patch worked to fix those issues? Thank You, |
| Comment by Bruno Faccini (Inactive) [ 05/Nov/15 ] |
|
Hello Manish, |
| Comment by Oz Rentas [ 05/Sep/17 ] |
|
This is quite old and no longer an active issue. Please close. |
| Comment by Peter Jones [ 05/Sep/17 ] |
|
ok - thanks Oz |
| Comment by Gerrit Updater [ 26/Jul/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/15654/ |
| Comment by Gerrit Updater [ 23/Aug/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48309 |
| Comment by Gerrit Updater [ 17/Oct/22 ] |
|
"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48900 |
| Comment by Gerrit Updater [ 02/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48900/ |