[LU-7867] OI scrubber causing performance issues Created: 12/Mar/16  Updated: 15/Jun/16  Resolved: 09/May/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.5
Fix Version/s: None

Type: Question/Request Priority: Major
Reporter: Dustin Leverman Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-7933 debugfs stat reports invalid EA entry... Resolved
Related
is related to LU-7378 Error destroying object with RC115 Resolved
Rank (Obsolete): 9223372036854775807

 Description   

OI scrubber causing performance issues.

  • There are orphaned objects that are causing the OI scrubber to start.
  • Is there an online or offline way of running lfsck at the moment?


 Comments   
Comment by Jian Yu [ 12/Mar/16 ]

Hi Dustin,

I checked the question with nasf who is the expert of lfsck and OI scrub. He had more questions:

  1. What is the orphan object, orphan MDT-object or orphan OST-object? Is it for the case of that the MDT-object has been unlinked but related OST-object has not been destroyed? If yes, that is the orphan OST-object, and can be repaired via layout LFSCK. Unfortunately, Lustre layout LFSCK for orphan OST-object is available only since Lustre-2.6.
  2. How the OI scrub is triggered, manually or automatically? If the latter case, means there are some inconsistency detected during normal RPC handling. OI scrub/LFSCK is online tool. Means we allow the application to run on client during OI scrub/LFSCK scanning. But under such case, such as FID based RPC is trying to access some bad OI mapping, then the client will get -115 error and retry the RPC until related OI mapping is fixed. Such processing is invisible to the application, and looks like the application become very slow.

Could you please take a look if the second one is the case you hit? Thank you.

Comment by Andreas Dilger [ 12/Mar/16 ]

Please provide specific error messages from the console log, otherwise it is difficult to know what actual problem being hit.

Comment by Dustin Leverman [ 14/Mar/16 ]

Jian,
This is an orphaned OST object, and the MDT object appears to have already been unlinked. Since we are running lustre-2.5.5+ we do not have the lfsck functionality we would need to fix it.

The OI scrub is being triggered automatically with messages like these every 5 minutes or so:

Mar 14 12:48:25 f1-oss1d8 kernel: [1379308.219915] Lustre: f1-OST00bf-osd: trigger OI scrub by RPC for [0x100000000:0x33d61e8:0x0], rc = 0 [1]
Mar 14 12:48:25 f1-oss1d8 kernel: [1379308.242266] LustreError: 117664:0:(ofd_obd.c:1096:ofd_destroy()) f1-OST00bf: error destroying object [0x100000000:0x33d61e8:0x0]: -115
Mar 14 12:48:26 f1-oss1d8 kernel: [1379309.219224] LustreError: 117664:0:(ofd_obd.c:1096:ofd_destroy()) f1-OST00bf: error destroying object [0x100000000:0x3304752:0x0]: -115

We have had to fix this a few times before by manually removing those objects, however we were wondering if we could use lfsck to fix this problem so we don't keep running into it? In our version of lustre is either online or offline lfsck supported?

Thanks,
Dustin

Comment by Dustin Leverman [ 14/Mar/16 ]

To address the performance impact of having the OI scrubber start every 5 minutes, is it safe to disable the OI scrubber for that OST until we come up with a solution?

lctl set_param osd-ldiskfs.f1-OST00bf.auto_scrub=0

Thanks,
Dustin

Comment by nasf (Inactive) [ 14/Mar/16 ]

Disable auto_scrub (set it as 0) will cause related RPC handler to return -78 to the sponsor directly if detect insistent OI mapping. That can avoid the slow performance, but related application may fail out (because of -78 failure). According to logs you pasted, it is the destroying OST-object triggered the OI scrub, and the failure of destroying the OST-object caused orphan.

In fact, I suspect that the OI mapping is valid, but it is some improper check that misguided the OI scrub. I will try to make patch to fix that.

Comment by Gerrit Updater [ 15/Mar/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/18912
Subject: LU-7867 osd-ldiskfs: compatible check for IDIF OI consistency
Project: fs/lustre-release
Branch: b2_5_fe
Current Patch Set: 1
Commit: 4770e4f44b9efefeb1673d9f87d4813067b6e0c6

Comment by nasf (Inactive) [ 15/Mar/16 ]

Dustin,

The patch 18912 will fix the issue of finding inconsistent OI mapping and trigger OI scrub. With that, you need NOT to disable auto_scrub, and the performance should be recovered. But for the orphan OST-objects that have existed already, this patch can help nothing. Since Lustre-2.5.5 does not support the new online layout LFSCK, you have to use the old lfsck (for detail, please check old Lustre manual "27.2 lfsck" section) to find out them. If you do NOT care about some orphan OST-objects (which waste some OST space, depends on the orphans' size), then just apply this patch directly.

Comment by Matt Ezell [ 15/Mar/16 ]

Hi Nasf-

Thanks for the patch. I think I'm missing something... This file system was formatted under Lustre 2.4 and upgraded to 2.5. We have never run anything newer here*, so I would expect all of our object to be old-style IDIF (OST index set to 0). As I understand it, the patch will now equate old-style and new-style IDIF objects to prevent the scrubber from starting due to this - this is good. But where are the new-style objids coming from? We have had ldiskfs issues on this OST before, might some version of e2fsck have "fixed" the LMA for us?

Is it this confusion between old-style and new-style IDIFs also causing the "error destroying object" with -115 message, or is that caused by something else? I understand that this patch cannot recover objects that have already been orphaned, but I want to make sure we don't keep accumulating orphans.

Thanks!

*Note: The servers have only ever been 2.4.x or 2.5.x. Clients have been 1.8.x, 2.4.x, 2.5.x, and 2.7.x. It's possible that a test client may have used a 2.8 pre-release at some point.

Comment by nasf (Inactive) [ 15/Mar/16 ]

The IDIF (for the OST-object) stored in the LMA EA on disk is the old format, means the index is always 0, but the index for the given IDIF from upper layer for lookup the OST-object may be not zero. Currently, I cannot 100% sure that it is such difference caused your trouble, but it is a likely suspected point and once it is happen, it will cause your trouble. This patch can verify and prevent such case. The patch will not change any on-disk data, applying the patch is harmless for your system.

On the other hand, the data on your disk keeps old format, nobody changed. When the OI scrub triggered in your system, it tried to update the IFID-in-LMA EA, but the new generated one was still zero indexed, the same as original one. That equals to nothing to be updated. That is why you may see that the OI scrub was triggered repeatedly by the same IDIF periodically.

The "error destroying object" with -115 message was returned by the osd_fid_lookup() when locating the OST-object for destroying. Before the OI scrub completed, the OSD could not know which local inode was mapped to the given IDIF, so the destroying operation was blocked or failed. That is why the orphan OST-object generated. This patch will filter out the index difference and unnecessary prevent OI scrub, then avoid more OST-object orphans.

Comment by Matt Ezell [ 15/Mar/16 ]

Thanks for the update. I noticed from your patch that there's already a CDEBUG for D_INODE in osd_check_lma so I added inode to the debug string and waited for this to trigger again. I dumped the log and found the following:

00000010:00000002:5.0:1458059877.008993:0:21045:0:(ost_handler.c:2398:ost_handle()) destroy
00002000:00080000:5.0:1458059877.008998:0:21045:0:(ofd_obd.c:1075:ofd_destroy()) f1-OST00bf: Destroy object 0x0:54354408 count 1
00000004:00000002:5.0:1458059877.009023:0:21045:0:(osd_handler.c:491:osd_check_lma()) f1-OST00bf: FID [0x100000000:0x33d61e8:0x0] != self_fid [0x100000000:0x320e401:0x0]
00000004:02000400:5.0:1458059877.009110:0:21045:0:(osd_handler.c:589:osd_fid_lookup()) f1-OST00bf-osd: trigger OI scrub by RPC for [0x100000000:0x33d61e8:0x0], rc = 0 [1]
00000010:00000002:5.0:1458059878.032100:0:21052:0:(ost_handler.c:2398:ost_handle()) destroy
00002000:00080000:5.0:1458059878.032104:0:21052:0:(ofd_obd.c:1075:ofd_destroy()) f1-OST00bf: Destroy object 0x0:53495634 count 1
00000004:00000002:5.0:1458059878.032126:0:21052:0:(osd_handler.c:491:osd_check_lma()) f1-OST00bf: FID [0x100000000:0x3304752:0x0] != self_fid [0x100000000:0x320e400:0x0]
00002000:00020000:5.0:1458059878.032129:0:21052:0:(ofd_obd.c:1096:ofd_destroy()) f1-OST00bf: error destroying object [0x100000000:0x3304752:0x0]: -115

Do those look like the same IDIF just with different indexes? I don't know how that information is packed in there, but they don't look like they match to me.

Here's some information about that first object:

[root@f1-oss1d8 ~]# debugfs -c -R 'stat O/0/d8/54354408' /dev/mapper/f1-ddn1d-l53
debugfs 1.42.13.wc4 (28-Nov-2015)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
Inode: 562889   Type: regular    Mode:  0666   Flags: 0x80000
Generation: 833716884    Version: 0x0000000e:00c7ec1c
User:   800   Group:   502   Size: 1048576
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2048
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
 atime: 0x566c289a:c40af440 -- Sat Dec 12 14:00:58 2015
 mtime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
crtime: 0x552a3831:79d980f4 -- Sun Apr 12 09:17:37 2015
Size of extra inode fields: 28
Extended attributes stored in inode body: 
invalid EA entry in inode
EXTENTS:
(0-254):17259-17513, (255):17514
[root@f1-oss1d8 ~]# rpm -qf /sbin/debugfs
e2fsprogs-1.42.13.wc4-7.el6.x86_64

What does "invalid EA entry in inode" mean? Is this object damaged?

Comment by Matt Ezell [ 15/Mar/16 ]

Err...

[root@f1-oss1d8 ~]# debugfs -c -R 'stat O/0/d8/54354408' /dev/mapper/f1-ddn1d-l53
debugfs 1.42.13.wc4 (28-Nov-2015)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
Inode: 562889   Type: regular    Mode:  0666   Flags: 0x80000
Generation: 833716884    Version: 0x0000000e:00c7ec1c
User:   800   Group:   502   Size: 1048576
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2048
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
 atime: 0x566c289a:c40af440 -- Sat Dec 12 14:00:58 2015
 mtime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
crtime: 0x552a3831:79d980f4 -- Sun Apr 12 09:17:37 2015
Size of extra inode fields: 28
Extended attributes stored in inode body: 
invalid EA entry in inode
EXTENTS:
(0-254):17259-17513, (255):17514
[root@f1-oss1d8 ~]# debugfs -c -R 'stat O/0/d1/52487169' /dev/mapper/f1-ddn1d-l53
debugfs 1.42.13.wc4 (28-Nov-2015)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
Inode: 562889   Type: regular    Mode:  0666   Flags: 0x80000
Generation: 833716884    Version: 0x0000000e:00c7ec1c
User:   800   Group:   502   Size: 1048576
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2048
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
 atime: 0x566c289a:c40af440 -- Sat Dec 12 14:00:58 2015
 mtime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
crtime: 0x552a3831:79d980f4 -- Sun Apr 12 09:17:37 2015
Size of extra inode fields: 28
Extended attributes stored in inode body: 
invalid EA entry in inode
EXTENTS:
(0-254):17259-17513, (255):17514

Note both objects claim to be inode 562889 but the link count is 1.

Comment by Andreas Dilger [ 15/Mar/16 ]

I was going to suggest to check object O/0/d0/52487168 for the same reason.

At this point, I would recommend to take this OST offline and run "e2fsck -fn" to see if there is corruption. I suspect that there will be at least some shared blocks, as it appears that at least one inode table block has been written to the wrong location, or there is some corruption in the directory entry. It is also unfortunate that the xattr data is reporting as corrupted, since this could otherwise be used to determine which inodes are the right ones and which are being referenced incorrectly. This can at least partially be checked from the kernel logs.

Comment by Andreas Dilger [ 15/Mar/16 ]

Depending on what kind of corruption is detected by e2fsck, it might be best to just let "e2fsck -fy" run and fix the problems, then ll_recover_lost_found_objs to recover anything left in lost+found.

Has there been any corruption or other disk errors reported on this system?

Comment by Dustin Leverman [ 15/Mar/16 ]

Andreas,
This LUN has had corruption issues in the past (last April to be exact). We did run an e2fsck on it until it ran clean, however that was using an older version of e2fsprogs. Perhaps the new version would find issues that the last version could not, and we can check that. It will take a week or two to schedule a downtime with the customer to perform maintenance.

I don't know if this is telling or not but $(degugfs -c -R 'stats' <LUN>) is telling us that the LUN is clean. Also, we did not open a Jira ticket on this last corruption issue, it was handled with DDN.

Thanks,
Dustin

Comment by nasf (Inactive) [ 16/Mar/16 ]

[root@f1-oss1d8 ~]# debugfs -c -R 'stat O/0/d8/54354408' /dev/mapper/f1-ddn1d-l53
debugfs 1.42.13.wc4 (28-Nov-2015)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
Inode: 562889 Type: regular Mode: 0666 Flags: 0x80000
Generation: 833716884 Version: 0x0000000e:00c7ec1c
User: 800 Group: 502 Size: 1048576
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 2048
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x552a389a:00000000 – Sun Apr 12 09:19:22 2015
atime: 0x566c289a:c40af440 – Sat Dec 12 14:00:58 2015
mtime: 0x552a389a:00000000 – Sun Apr 12 09:19:22 2015
crtime: 0x552a3831:79d980f4 – Sun Apr 12 09:17:37 2015
Size of extra inode fields: 28
Extended attributes stored in inode body:
invalid EA entry in inode
EXTENTS:
(0-254):17259-17513, (255):17514

That means the OI mapping for FID [0x100000000:0x33d61e8:0x0] is mapped to the inode 562889 with the name entry "/O/0/d8/54354408", but such name entry is wrong. The inode is referenced by the name entry "/O/0/d1/52487169", that has been verified via the FID-in-LMA. The info of "invalid EA entry in inode" does not means the inode corruption, because the OSD could read LMA EA. I would suggest you to update the e2fsprogs, that may be helpful, and if we can get the PFID EA from the inode 562889, then we can know whether related MDT-object is still there or not.

Lustre OST-object is always single referenced, so the unrecognised mapping entry "/O/0/d8/54354408" is wrong. I am not sure how the corruption was generated, but your site is not the first one to hit such trouble. Anyway, run e2fsck to fix the disk level corruption is the first step.

Comment by Matt Ezell [ 16/Mar/16 ]

The check that is causing debugfs to mark the EA data as invalid is

(!ea_inode && value + entry->e_value_size >= end)

because for LMA, entry->e_value_size = 24 and value + entry->e_value_size = end. I ran debugfs under gdb and manually set entry->e_value_size = 23. This resulted in the following output:

Inode: 562889   Type: regular    Mode:  0666   Flags: 0x80000
Generation: 833716884    Version: 0x0000000e:00c7ec1c
User:   800   Group:   502   Size: 1048576
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 2048
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
 atime: 0x566c289a:c40af440 -- Sat Dec 12 14:00:58 2015
 mtime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
crtime: 0x552a3831:79d980f4 -- Sun Apr 12 09:17:37 2015
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 " (23)
  fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16)
  fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2
EXTENTS:
(0-254):17259-17513, (255):17514

I don't think that formatting for the parent fid is correct, and feeding it directly into fid2path results in invalid argument. I think the sequence provided is actually the sequence+objid (is this a bug? should this print a 'valid' fid?) so I tried:

# lfs fid2path /lustre/f1 0x200032922:0x95f8:0x0
fid2path: error on FID 0x200032922:0x95f8:0x0: No such file or directory

This is expected, since the RPC that causes this OI lookup is trying to destroy the object anyway. My guess is that the MDS has deleted its inode and it added all the objects to the llog for removal. llog processing sent an unlink RPC to the OST, but due to this damaged object returned -115. I'm not familiar with llog processing, but I would hazard to guess that it retries the unlink (since we are seeing this RPC every ~600 seconds).

So I guess the options at this point are:

  • Run the newer e2fsck, hope it finds and fixes or unlinks these problematic objects
  • Just unlink these objects, since that's what the MDS is trying to do anyway

Since this has happened before (see LU-7378) we are worried that this might keep happening. Is there any way to scan the objects and determine if any more are in this state?

Comment by nasf (Inactive) [ 17/Mar/16 ]

The output of debugfs seems wrong, because the OSD got the self FID as [0x100000000:0x320e401:0x0], but the debugfs output is: lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 " (23). So the value offset is wrong.

Which version e2fsprogs are you using? Is it possible to try newer e2fsprogs?
Before resolve the debugfs issue, please NOT remove these OST-objects manually, because we are not sure whether they are really the targets to be destroyed.

Comment by Matt Ezell [ 17/Mar/16 ]

Note from my previous comment that I manually set the value length to 23 to avoid the 'invalid EA' message

I ran debugfs under gdb and manually set entry->e_value_size = 23.

We are using the latest version available from Intel: e2fsprogs-1.42.13.wc4-7.el6.x86_64.rpm

[root@f1-oss1d8 ~]# rpm -qi e2fsprogs
Name        : e2fsprogs                    Relocations: (not relocatable)
Version     : 1.42.13.wc4                       Vendor: (none)
Release     : 7.el6                         Build Date: Fri Dec 11 19:51:16 2015
Install Date: Wed Feb 24 17:44:21 2016         Build Host: onyx-8-sde1-el6-x8664.onyx.hpdd.intel.com
Group       : System Environment/Base       Source RPM: e2fsprogs-1.42.13.wc4-7.el6.src.rpm
Size        : 3170704                          License: GPLv2
Signature   : (none)
URL         : https://downloads.hpdd.intel.com/public/e2fsprogs/
Summary     : Utilities for managing ext2, ext3, and ext4 filesystems
Description :
The e2fsprogs package contains a number of utilities for creating,
checking, modifying, and correcting any inconsistencies in second,
third and fourth extended (ext2/ext3/ext4) filesystems. E2fsprogs
contains e2fsck (used to repair filesystem inconsistencies after an
unclean shutdown), mke2fs (used to initialize a partition to contain
an empty ext2 filesystem), debugfs (used to examine the internal
structure of a filesystem, to manually repair a corrupted
filesystem, or to create test cases for e2fsck), tune2fs (used to
modify filesystem parameters), and most of the other core ext2fs
filesystem utilities.

You should install the e2fsprogs package if you need to manage the
performance of an ext2, ext3, or ext4 filesystem.
Comment by Gerrit Updater [ 18/Mar/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/18999
Subject: LU-7867 debugfs: locate EA value correctly
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 4f64aa8135d0865fba88c3ee49981362ec0cb994

Comment by nasf (Inactive) [ 18/Mar/16 ]

Matt,

Would you please to apply above patch on your e2fsprogs that fixed the debugfs issue, and please run the patched debugfs on your system.

Thanks!

Comment by Andreas Dilger [ 18/Mar/16 ]

Sorry, committed this comment too quickly.

Something strange is going on here. I'm just checking my local MDT and OST filesystems with debugfs 1.42.12.wc1 and it is printing the xattrs values properly. Checking with debugfs 1.42.13.wc4 appears to have the same problem, looking into it more closely.

Comment by Andreas Dilger [ 18/Mar/16 ]

Updated patch http://review.whamcloud.com/18999 should now resolve the problem introduced in the upstream code.

Comment by Matt Ezell [ 18/Mar/16 ]

I used the build from Jenkins and it now correctly shows the EAs for /O/0/d8/54354408:

  lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 00 " (24)
  lma: fid=[0x100000000:0x320e401:0x0] compat=8 incompat=0
  fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16)
  fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2
Comment by Matt Ezell [ 18/Mar/16 ]

Andreas - is the parent fid being printed correctly? I ran a quick test on a non-corrupted OST:

# lfs getstripe test
test
lmm_stripe_count:   4
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  123
        obdidx           objid           objid           group
           123          120572        0x1d6fc                0
           124          120522        0x1d6ca                0
           125          120541        0x1d6dd                0
           126          120607        0x1d71f                0
# lfs path2fid test
[0x200000410:0x2254:0x0]
# debugfs -c -R 'stat O/0/d28/120572' /dev/mapper/f1-ddn1a-l24
...
Extended attributes stored in inode body: 
  lma = "00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 fc d6 01 00 00 00 00 00 " (24)
  lma: fid=[0x100000000:0x1d6fc:0x0] compat=0 incompat=0
  fid = "54 22 00 10 04 00 00 02 00 00 00 00 00 00 00 00 " (16)
  fid: parent=[0x200000410002254:0x0:0x0] stripe=0

Should I open a separate ticket for that?

Comment by Andreas Dilger [ 19/Mar/16 ]

No, the parent doesn't look correct. It should exactly match the output from lfs fid2path, but it looks offset by 3 bytes for some reason. We can add the fix into the same patch, no need for a new ticket.

Comment by nasf (Inactive) [ 21/Mar/16 ]

Matt,

Is the "test" an existing file (updated from old device) or new created one? If it is the latter case, how do you generate it? and would you please to retest test with -1 level debug collected on the client and related OST?

Your system is Lustre-2.5.5 based (both client and server), right? any additional patches on that?

(BTW, I have tried Lustre-2.5.5 in my local VM environment, but cannot reproduce your trouble - strange PFID EA. So I need more logs from you.)

Thanks!

Comment by Matt Ezell [ 21/Mar/16 ]

For this file system, we have a mixture of clients - most are 2.5, but we also have some 1.8 and 2.7. I just ran some tests, and it seems that new files created under 2.5 or 2.7 are OK. But the file I created on the 1.8 client shows the strange PFID EA.

The 1.8 clients are going away soon, so no need to debug that client, but we do have a ton of files on disk with the strange PFID EA. My main concern is that a future lfsck run (under 2.8 servers, eventually) might mark these files as damaged and/or orphaned.

I'll run some tests with debugging later today and upload relevant logs. Thanks.

Comment by nasf (Inactive) [ 21/Mar/16 ]

It is the client to send the PFID information to the OST when write/setattr. According to your description, the 1.8 client may sent "ostid" formatted PFID to the OST, then caused the strange PFID EA. Usually, as long as the OST-object is NOT real orphan, then after upgrading your system to Lustre-2.6 or newer, the layout LFSCK will handle it as unmatched MDT-OST objects pairs, and will correct the PFID EA with the right MDT-object's FID.

Comment by Matt Ezell [ 21/Mar/16 ]

I think the strange PFID EA is coming from fid_flatten. It takes the FID and does

ino = (seq << 24) + ((seq >> 24) & 0xffffff0000ULL) + fid_oid(fid);

to make the 128-bit FID fit in a 64-bit inode. I didn't specifically see this in the log, but I believe this is what the client is setting as PFID. Since the 1.8 clients are going away, it's not worth looking into it further. We'll make sure to test LFSCK in 2.8 against objects created under 1.8 before upgrading the server.

So, back to the original issue at hand. I guess we need to take a downtime to run e2fsck. There could be two outcomes:

  • e2fsck identifies these objects as damaged and repairs or unlinks them
  • It doesn't notice the issue and runs cleanly

If situation #2 happens, should we manually unlink them? Then, how do we ensure there aren't more damaged objects on this OST?

Comment by Andreas Dilger [ 23/Mar/16 ]

I suspect that e2fsck will just consider these files as hard linked, so you will find them by looking for regular files under O/0/ with nlink=2.

Comment by Gerrit Updater [ 29/Mar/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/18999/
Subject: LU-7867 debugfs: fix check for out-of-bound xattr value
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 595b51179eaafdc1c50ab8348cb83d9429aa2bfa

Comment by nasf (Inactive) [ 12/Apr/16 ]

Any further information about the issue? Have we ever run e2fsck as discussed above?

Comment by Gerrit Updater [ 12/Apr/16 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19493
Subject: LU-7867 e2fsprogs: update build version to 1.42.13.wc5
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: fae5f9fa7b3bb6bf134f0560b163276f9f6187a3

Comment by Dustin Leverman [ 13/Apr/16 ]

We are in a holding pattern on this until the customer allows us to take an outage. It will probably be another 5-6 weeks before we get the green light. I think it is okay to close this ticket.

Our plan is to run the e2fsck on the LUN, if that doesn't fix the issue we will wipe out the 2 OST objects by hand and will lfsck when we update to lustre-2.8 servers.

Thanks,
Dustin

Comment by nasf (Inactive) [ 13/Apr/16 ]

Thanks Dustin for the explanation. Then we will close the ticket after landing Andreas' latest patch.

Comment by Gerrit Updater [ 02/May/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/19493/
Subject: LU-7867 e2fsprogs: update build version to 1.42.13.wc5
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 00c372887fff6d4a9baae075c3fc1523c47d8ad4

Comment by nasf (Inactive) [ 09/May/16 ]

Related patches have been landed.

Generated at Sat Feb 10 02:12:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.