[LU-17267] mdt_getattr_internal()) nbp10-MDT0001: getattr error for [0xdc0003712:0x4e9b:0x0]: rc = -34 Created: 06/Nov/23 Updated: 19/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 2 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Please advise how to correct a number of files with the same issue: ls -l returns error
[root@nbp10-srv1 data]# ls -l weld_analysis
ls: cannot access 'weld_analysis': Numerical result out of range
directory is on MDT0001 Debug output shows (full debug will be attached)
00000004:00020000:1.0:1699293797.851375:0:2470770:0:(mdt_handler.c:1454:mdt_getattr_internal()) nbp10-MDT0001: getattr error for [0xdc0003712:0x4e9b:0x0]: rc = -34
stat of fid on the MDT looks like this debugfs: stat 0xdc0003712:0x4e9b:0x0 Inode: 306186733 Type: directory Mode: 0700 Flags: 0x80000 Generation: 2816394278 Version: 0x00000000:00000000 User: 11810 Group: 41007 Project: 0 Size: 4096 File ACL: 2379763458 Links: 4 Blockcount: 16 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5d759848:00000000 -- Sun Sep 8 17:09:44 2019 atime: 0x6515f065:00000000 -- Thu Sep 28 14:30:13 2023 mtime: 0x5d759848:00000000 -- Sun Sep 8 17:09:44 2019 crtime: 0x5fd3066d:379a1ac8 -- Thu Dec 10 21:41:01 2020 Size of extra inode fields: 32 Extended attributes: lma: fid=[0xdc0003712:0x4e9b:0x0] compat=0 incompat=c trusted.lov (352) trusted.dmv (352) linkea: idx=0 parent=[0x200000a37:0xb:0x0] name='weld_analysis' EXTENTS: (0):2379763457 |
| Comments |
| Comment by Mahmoud Hanafi [ 06/Nov/23 ] |
|
I brefely started lfsck in dry-run on MDT0001 got lots of ugliness right away. Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK add flags for [0xdc0002c7a:0x14617:0x0] in the trace file, flags 2, old 0, new 2: rc = -22 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK assistant fail to handle the entry: [0x0:0x0:0x0], parent [0xdc0002c7a:0x14617:0x0], name ..: rc = -22 Nov 6 10:34:37 nbp10-srv3 kernel: CPU: 6 PID: 2704006 Comm: lfsck_namespace Kdump: loaded Tainted: G OE --------- - - 4.18.0-477.10.1.el8_lustre.x86_64 #1 Nov 6 10:34:37 nbp10-srv3 kernel: Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 04/20/2023 Nov 6 10:34:37 nbp10-srv3 kernel: Call Trace: Nov 6 10:34:37 nbp10-srv3 kernel: dump_stack+0x41/0x60 Nov 6 10:34:37 nbp10-srv3 kernel: lfsck_trans_create.part.58+0x63/0x70 [lfsck] Nov 6 10:34:37 nbp10-srv3 kernel: lfsck_namespace_trace_update+0x978/0x980 [lfsck] Nov 6 10:34:37 nbp10-srv3 kernel: lfsck_namespace_assistant_handler_p1+0xa36/0x2060 [lfsck] Nov 6 10:34:37 nbp10-srv3 kernel: ? __switch_to_asm+0x43/0x80 Nov 6 10:34:37 nbp10-srv3 kernel: ? __schedule+0x2d9/0x870 Nov 6 10:34:37 nbp10-srv3 kernel: lfsck_assistant_engine+0x363/0x1c40 [lfsck] Nov 6 10:34:37 nbp10-srv3 kernel: ? __switch_to+0x10c/0x450 Nov 6 10:34:37 nbp10-srv3 kernel: ? finish_task_switch+0x86/0x2e0 Nov 6 10:34:37 nbp10-srv3 kernel: ? __schedule+0x2d9/0x870 Nov 6 10:34:37 nbp10-srv3 kernel: ? finish_wait+0x80/0x80 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8bc:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: ? lfsck_master_engine+0xcd0/0xcd0 [lfsck] Nov 6 10:34:37 nbp10-srv3 kernel: kthread+0x134/0x150 Nov 6 10:34:37 nbp10-srv3 kernel: ? set_kthread_struct+0x50/0x50 Nov 6 10:34:37 nbp10-srv3 kernel: ret_from_fork+0x1f/0x40 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK add flags for [0xdc0002c82:0x1b015:0x0] in the trace file, flags 2, old 0, new 2: rc = -22 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK assistant fail to handle the entry: [0x0:0x0:0x0], parent [0xdc0002c82:0x1b015:0x0], name ..: rc = -22 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8bd:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8be:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8bf:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c0:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c1:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c2:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c3:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c4:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c5:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c6:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c7:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c8:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c9:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8ca:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cb:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cc:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cd:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8ce:0x0]: rc = 448 Nov 6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cf:0x0]: rc = 448
|
| Comment by Peter Jones [ 07/Nov/23 ] |
|
Hi Lai Could you please advise here? Thanks Peter |
| Comment by Lai Siyao [ 08/Nov/23 ] |
|
Do you have any idea how these files become inaccessible? Can they be accessed before? |
| Comment by Mahmoud Hanafi [ 08/Nov/23 ] |
|
No idea how they became in this state. I am sure they were ok at some time because the user was using them. There is several directories in this state. |
| Comment by Mahmoud Hanafi [ 17/Nov/23 ] |
|
I like to get an update on this case. |
| Comment by Lai Siyao [ 21/Nov/23 ] |
|
The default LMV of those directories are invalid: the default LMV size should always be '48', but debugfs result shows it's 352, therefore reading default LMV failed with -ERANGE (-34). You may delete the default LMV by debugfs: ea_rm <dirname> trusted.dmv. |
| Comment by Peter Jones [ 02/Dec/23 ] |
|
Mahmoud Have you tried the procedure outlined above? Peter |