[LU-17164] Old files not accessible anymore with lma incompat=2 and no lov Created: 03/Oct/23 Updated: 05/Oct/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Stephane Thiell | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.12.8+patches, CentOS 7.9, ldiskfs |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hello! [root@oak-cli01 ~]# ls -l /oak/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup ls: cannot access '/oak/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup/pseudogenome.fasta': No such file or directory ls: cannot access '/oak/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup/repnames.bed': No such file or directory total 0 -????????? ? ? ? ? ? pseudogenome.fasta -????????? ? ? ? ? ? repnames.bed We found them with no trusted.lov, just a trusted.lma and ACLs (system.posix_acl_access), owned by root / root and 0000 as permissions (note that I since changed the ownership/permissions which are reflected in the debugfs output below, so the ctime has been updated too): oak-MDT0000> debugfs: stat ROOT/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup/pseudogenome.fasta
Inode: 745295211 Type: regular Mode: 0440 Flags: 0x0
Generation: 392585436 Version: 0x00000000:00000000
User: 0 Group: 0 Project: 0 Size: 0
File ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x651b1517:a256cacc -- Mon Oct 2 12:08:07 2023
atime: 0x649be9dc:d980bf08 -- Wed Jun 28 01:05:48 2023
mtime: 0x5e7ae8a2:437ca450 -- Tue Mar 24 22:14:10 2020
crtime: 0x649be9dc:d980bf08 -- Wed Jun 28 01:05:48 2023
Size of extra inode fields: 32
Extended attributes:
lma: fid=[0x2f800028cf:0x944c:0x0] compat=0 incompat=2
system.posix_acl_access:
user::r--
group::rwx
group:3352:rwx
mask::r--
other::---
BLOCKS:
oak-MDT0000> debugfs: stat ROOT/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup/repnames.bed
Inode: 745295212 Type: regular Mode: 0440 Flags: 0x0
Generation: 392585437 Version: 0x00000000:00000000
User: 0 Group: 0 Project: 0 Size: 0
File ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x651b1517:a256cacc -- Mon Oct 2 12:08:07 2023
atime: 0x649be9dc:d980bf08 -- Wed Jun 28 01:05:48 2023
mtime: 0x5e7ae8ad:07654c1c -- Tue Mar 24 22:14:21 2020
crtime: 0x649be9dc:d980bf08 -- Wed Jun 28 01:05:48 2023
Size of extra inode fields: 32
Extended attributes:
lma: fid=[0x2f800028cf:0x953d:0x0] compat=0 incompat=2
system.posix_acl_access:
user::r--
group::rwx
group:3352:rwx
mask::r--
other::---
BLOCKS:
Note also that the crtime is recent because we migrated this MDT (MDT0000) using a backup/restore method to new hardware last June 2023, but we have verified yesterday that these files were already like that before the MDT migration (we still have access to the old storage array). So we know it's not something we introduced during the migration. Just in case you notice the crtime and ask Timeline as we understand it:
Changelog events on those FIDs (we log them to Splunk): 2022-06-08T13:15:14.793547861-0700 mdt=oak-MDT0002 id=9054081490 type=SATTR flags=0x44 uid=0 gid=0 target=[0x2f800028cf:0x944c:0x0] 2022-06-08T13:15:14.795309940-0700 mdt=oak-MDT0002 id=9054081491 type=SATTR flags=0x44 uid=0 gid=0 target=[0x2f800028cf:0x953d:0x0] It's really curious to see those coming from oak-MDT0002 !?? Oct 02 11:35:12 oak-md1-s1 kernel: LustreError: 59611:0:(mdt_open.c:1227:mdt_cross_open()) oak-MDT0002: [0x2f800028cf:0x944c:0x0] doesn't exist!: rc = -14 Oct 02 11:35:37 oak-md1-s1 kernel: LustreError: 59615:0:(mdt_open.c:1227:mdt_cross_open()) oak-MDT0002: [0x2f800028cf:0x944c:0x0] doesn't exist!: rc = -14 Could Lustre be confused on which MDT these FIDs are supposed to be served because of corrupted metadata? Why on earth could oak-MDT0002 be involved here? Parent FID: [root@oak-cli01 ~]# lfs path2fid /oak/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup [0x200033e88:0x114:0x0] [root@oak-cli01 ~]# lfs getdirstripe /oak/stanford/groups/khavari/users/dfporter/before_2021_projects/genome/fastRepEnrich_hg38/fastRepEnrich/fastRE_Setup lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
We tried to run lfsck namespace but it crashed our MDS likely due to KERNEL: /usr/lib/debug/lib/modules/3.10.0-1160.83.1.el7_lustre.pl1.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 64
DATE: Mon Oct 2 22:55:53 2023
UPTIME: 48 days, 16:17:05
LOAD AVERAGE: 2.94, 3.39, 3.52
TASKS: 3287
NODENAME: oak-md1-s2
RELEASE: 3.10.0-1160.83.1.el7_lustre.pl1.x86_64
VERSION: #1 SMP Sun Feb 19 18:38:37 PST 2023
MACHINE: x86_64 (3493 Mhz)
MEMORY: 255.6 GB
PANIC: "Kernel panic - not syncing: LBUG"
PID: 24913
COMMAND: "lfsck_namespace"
TASK: ffff8e62979fa100 [THREAD_INFO: ffff8e5f41a48000]
CPU: 8
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 24913 TASK: ffff8e62979fa100 CPU: 8 COMMAND: "lfsck_namespace"
#0 [ffff8e5f41a4baa8] machine_kexec at ffffffffaac69514
#1 [ffff8e5f41a4bb08] __crash_kexec at ffffffffaad29d72
#2 [ffff8e5f41a4bbd8] panic at ffffffffab3ab713
#3 [ffff8e5f41a4bc58] lbug_with_loc at ffffffffc06538eb [libcfs]
#4 [ffff8e5f41a4bc78] lfsck_namespace_assistant_handler_p1 at ffffffffc1793e68 [lfsck]
#5 [ffff8e5f41a4bd80] lfsck_assistant_engine at ffffffffc177604e [lfsck]
#6 [ffff8e5f41a4bec8] kthread at ffffffffaaccb511
#7 [ffff8e5f41a4bf50] ret_from_fork_nospec_begin at ffffffffab3c51dd
According to Robinhood, these files' striping is likely 1 so we're going to try to find their object IDs. Do you have any idea on how to resolve this without running lfsck? How can we find/reattach the objects? Thanks! |
| Comments |
| Comment by Stephane Thiell [ 04/Oct/23 ] |
|
We still don't know what caused this in the first place. Perhaps this was due to a lfs migrate which didn't end well, or introduced when we upgraded from Lustre 2.10 to 2.12. Any clue would be appreciated... |
| Comment by Andreas Dilger [ 05/Oct/23 ] |
|
Stephane, the inode is marked in the trusted.lma xattr with incompat: 2 which is: enum lma_incompat { LMAI_AGENT = 0x00000002, /* agent inode */ which means that this is a "proxy" inode created on the local MDT that is pointing at an inode with the given FID 0x2f800028cf:0x944c:0x0 on the remote MDT, presumably MDT0002. Inodes created on MDT0000 would have a sequence number like 0x20000xxxx. Because the remote MDT0002 inode doesn't exist, it might be exposing the underlying agent inode, or possibly you are extracting this info from the underlying ldiskfs filesystem? You would need to look for 0x2f800028cf:0x944c:0x0 in the REMOTE_PARENT_DIR on MDT0002 to see if it is there or missing. Running LFSCK would potentially be able to recreate the inode on MDT0002 if the OST objects still exist (they will have a backpointer to 0x2f800028cf:0x944c:0x0). If the OST objects are missing, then you could delete this inode from the local filesystem (possibly via ldiskfs). |
| Comment by Stephane Thiell [ 05/Oct/23 ] |
|
Hi Andreas! Ah! All the inodes in REMOTE_PARENT_DIR on MDT0002 start with the sequence 0x2f8000xxxx but 0x2f800028cf:0x944c:0x0 cannot be found. That also explains the mdt_cross_open() errors we were seeing on MDT0002. It looks like this user had access to another directory tree on MDT0002. Do you think there is a possibility of a mv done by a user at some point (which may have be done with Lustre 2.10 or 2.12) that somehow was incomplete, perhaps after a server crash, and had let this agent inode on MDT0000 but no target inode on MDT0002? I am glad to hear that LFSCK would likely help in that case. We'd like to start using it but after we upgrade Oak to 2.15. |