[LU-17169] can't delete corrupted directory Created: 05/Oct/23 Updated: 03/Nov/23 Resolved: 03/Nov/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Peter Jones |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Users has a courpted directory.
ls -l |grep vol
ls: cannot access 'volcano': No such file or directory
d????????? ? ? ? ? ? volcano
It is a directory on the 3rd MDT and here is stat output
debugfs: ls -l
....
151872827 40000 (18) 0 0 4096 31-Dec-1969 16:00 volcano
....
debugfs: stat volcano
Inode: 151872827 Type: directory Mode: 0000 Flags: 0x80000
Generation: 2866389135 Version: 0x00000000:00000000
User: 0 Group: 0 Project: 0 Size: 4096
File ACL: 0
Links: 2 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x651736e5:ea8ba3cc – Fri Sep 29 13:43:17 2023
atime: 0x00000000:fffffff8 – Wed Dec 31 16:00:00 1969
mtime: 0x00000000:fffffff8 – Wed Dec 31 16:00:00 1969
crtime: 0x651736e5:ea4e9a98 – Fri Sep 29 13:43:17 2023
Size of extra inode fields: 32
Extended attributes:
lma: fid=[0x28003d638:0x1:0x0] compat=0 incompat=2
EXTENTS:
(0):2357133314
How should we delete this? Should we run an lfsck? |
| Comments |
| Comment by Andreas Dilger [ 05/Oct/23 ] |
|
The first thing to do before deleting anything is to check if there are any errors reported on the console logs on the client or MDS? Depending on the error, it might make sense to run e2fsck or lfsck to see if the directory can be repaired. You could run a read-only e2fsck to see if this directory inode number is reporting any errors. |
| Comment by Mahmoud Hanafi [ 06/Oct/23 ] |
|
I started a lfsck dry-run, there are a lot of layout_repaire like this
layout_mdts_init: 0 layout_mdts_scanning-phase1: 1 layout_mdts_scanning-phase2: 2 layout_mdts_completed: 0 layout_mdts_failed: 0 layout_mdts_stopped: 0 layout_mdts_paused: 0 layout_mdts_crashed: 0 layout_mdts_partial: 0 layout_mdts_co-failed: 0 layout_mdts_co-stopped: 0 layout_mdts_co-paused: 0 layout_mdts_unknown: 0 layout_osts_init: 0 layout_osts_scanning-phase1: 0 layout_osts_scanning-phase2: 69 layout_osts_completed: 0 layout_osts_failed: 0 layout_osts_stopped: 0 layout_osts_paused: 0 layout_osts_crashed: 0 layout_osts_partial: 0 layout_osts_co-failed: 0 layout_osts_co-stopped: 0 layout_osts_co-paused: 0 layout_osts_unknown: 0 layout_repaired: 92227209 namespace_mdts_init: 0 namespace_mdts_scanning-phase1: 1 namespace_mdts_scanning-phase2: 2 namespace_mdts_completed: 0 namespace_mdts_failed: 0 namespace_mdts_stopped: 0 namespace_mdts_paused: 0 namespace_mdts_crashed: 0 namespace_mdts_partial: 0 namespace_mdts_co-failed: 0 namespace_mdts_co-stopped: 0 namespace_mdts_co-paused: 0 namespace_mdts_unknown: 0 namespace_osts_init: 0 namespace_osts_scanning-phase1: 0 namespace_osts_scanning-phase2: 0 namespace_osts_completed: 0 namespace_osts_failed: 0 namespace_osts_stopped: 0 namespace_osts_paused: 0 namespace_osts_crashed: 0 namespace_osts_partial: 0 namespace_osts_co-failed: 0 namespace_osts_co-stopped: 0 namespace_osts_co-paused: 0 namespace_osts_unknown: 0 namespace_repaired: 1051 |
| Comment by Mahmoud Hanafi [ 06/Oct/23 ] |
|
I am also attaching the full lfsck dry-run output. We will need to schedule dedicated time to run e2fsck. |
| Comment by Andreas Dilger [ 06/Oct/23 ] |
What does "lfs getstripe -v /mnt/nbp11/.lustre/fid/0x28000233a:0x46cf:0x0" report for the file layout? The lmm_oi is the old "backpointer" from the file layout to store the FID, but it isn't really used for anything these days and doesn't necessarily indicate any sign of problems. If the filesystem is older then it is possible that it had a bug that wrote the lmm_oi in an incorrect format. |
| Comment by Mahmoud Hanafi [ 06/Oct/23 ] |
|
/nobackupp11/.lustre/fid/0x28000233a:0x46cf:0x0 lmm_magic: 0x0BD10BD0 lmm_seq: 0x2000264db lmm_object_id: 0x4f02 lmm_fid: [0x2000264db:0x4f02:0x0] lmm_stripe_count: 1 lmm_stripe_size: 4194304 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 22 obdidx objid objid group 22 6466362 0x62ab3a 0
This is a very old filesystem. |
| Comment by Andreas Dilger [ 06/Oct/23 ] |
|
It looks like the FID stored in the layout is different than the FID of the file. That might be because the file was migrated but the layout FID was not updated. That was an old bug which has since been fixed. |
| Comment by Peter Jones [ 20/Oct/23 ] |
|
Anything else needed here Mahmoud or can we close this ticket out? |
| Comment by Peter Jones [ 03/Nov/23 ] |
|
Seems to be no further questions |