[LU-17267] mdt_getattr_internal()) nbp10-MDT0001: getattr error for [0xdc0003712:0x4e9b:0x0]: rc = -34 Created: 06/Nov/23  Updated: 19/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.3
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File LU-17267.dk.gz    
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

Please advise how to correct a number of files with the same issue:

ls -l returns error

[root@nbp10-srv1 data]# ls -l  weld_analysis
ls: cannot access 'weld_analysis': Numerical result out of range 

directory is on MDT0001

Debug output shows (full debug will be attached)

00000004:00020000:1.0:1699293797.851375:0:2470770:0:(mdt_handler.c:1454:mdt_getattr_internal()) nbp10-MDT0001: getattr error for [0xdc0003712:0x4e9b:0x0]: rc = -34 

stat of fid on the MDT looks like this

debugfs:  stat 0xdc0003712:0x4e9b:0x0
Inode: 306186733   Type: directory    Mode:  0700   Flags: 0x80000
Generation: 2816394278    Version: 0x00000000:00000000
User: 11810   Group: 41007   Project:     0   Size: 4096
File ACL: 2379763458
Links: 4   Blockcount: 16
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5d759848:00000000 -- Sun Sep  8 17:09:44 2019
 atime: 0x6515f065:00000000 -- Thu Sep 28 14:30:13 2023
 mtime: 0x5d759848:00000000 -- Sun Sep  8 17:09:44 2019
crtime: 0x5fd3066d:379a1ac8 -- Thu Dec 10 21:41:01 2020
Size of extra inode fields: 32
Extended attributes:
  lma: fid=[0xdc0003712:0x4e9b:0x0] compat=0 incompat=c
  trusted.lov (352)
  trusted.dmv (352)
  linkea: idx=0 parent=[0x200000a37:0xb:0x0] name='weld_analysis'
EXTENTS:
(0):2379763457 


 Comments   
Comment by Mahmoud Hanafi [ 06/Nov/23 ]

I brefely started lfsck in dry-run on MDT0001 got lots of ugliness  right away.

Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK add flags for [0xdc0002c7a:0x14617:0x0] in the trace file, flags 2, old 0, new 2: rc = -22
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK assistant fail to handle the entry: [0x0:0x0:0x0], parent [0xdc0002c7a:0x14617:0x0], name ..: rc = -22
Nov  6 10:34:37 nbp10-srv3 kernel: CPU: 6 PID: 2704006 Comm: lfsck_namespace Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-477.10.1.el8_lustre.x86_64 #1
Nov  6 10:34:37 nbp10-srv3 kernel: Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 04/20/2023
Nov  6 10:34:37 nbp10-srv3 kernel: Call Trace:
Nov  6 10:34:37 nbp10-srv3 kernel: dump_stack+0x41/0x60
Nov  6 10:34:37 nbp10-srv3 kernel: lfsck_trans_create.part.58+0x63/0x70 [lfsck]
Nov  6 10:34:37 nbp10-srv3 kernel: lfsck_namespace_trace_update+0x978/0x980 [lfsck]
Nov  6 10:34:37 nbp10-srv3 kernel: lfsck_namespace_assistant_handler_p1+0xa36/0x2060 [lfsck]
Nov  6 10:34:37 nbp10-srv3 kernel: ? __switch_to_asm+0x43/0x80
Nov  6 10:34:37 nbp10-srv3 kernel: ? __schedule+0x2d9/0x870
Nov  6 10:34:37 nbp10-srv3 kernel: lfsck_assistant_engine+0x363/0x1c40 [lfsck]
Nov  6 10:34:37 nbp10-srv3 kernel: ? __switch_to+0x10c/0x450
Nov  6 10:34:37 nbp10-srv3 kernel: ? finish_task_switch+0x86/0x2e0
Nov  6 10:34:37 nbp10-srv3 kernel: ? __schedule+0x2d9/0x870
Nov  6 10:34:37 nbp10-srv3 kernel: ? finish_wait+0x80/0x80
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8bc:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: ? lfsck_master_engine+0xcd0/0xcd0 [lfsck]
Nov  6 10:34:37 nbp10-srv3 kernel: kthread+0x134/0x150
Nov  6 10:34:37 nbp10-srv3 kernel: ? set_kthread_struct+0x50/0x50
Nov  6 10:34:37 nbp10-srv3 kernel: ret_from_fork+0x1f/0x40
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK add flags for [0xdc0002c82:0x1b015:0x0] in the trace file, flags 2, old 0, new 2: rc = -22
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: namespace LFSCK assistant fail to handle the entry: [0x0:0x0:0x0], parent [0xdc0002c82:0x1b015:0x0], name ..: rc = -22
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8bd:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8be:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8bf:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c0:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c1:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c2:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c3:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c4:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c5:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c6:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c7:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c8:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8c9:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8ca:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cb:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cc:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cd:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8ce:0x0]: rc = 448
Nov  6 10:34:37 nbp10-srv3 kernel: Lustre: nbp10-MDT0001-osd: layout LFSCK master found bad lmm_oi for [0xdc0002c7e:0x1f8cf:0x0]: rc = 448 

 

 

Comment by Peter Jones [ 07/Nov/23 ]

Hi Lai

Could you please advise here?

Thanks

Peter

Comment by Lai Siyao [ 08/Nov/23 ]

Do you have any idea how these files become inaccessible? Can they be accessed before?

Comment by Mahmoud Hanafi [ 08/Nov/23 ]

No idea how they became in this state. I am sure they were ok at some time because the user was using them. There is several directories in this state. 

Comment by Mahmoud Hanafi [ 17/Nov/23 ]

I like to get an update on this case.

Comment by Lai Siyao [ 21/Nov/23 ]

The default LMV of those directories are invalid: the default LMV size should always be '48', but debugfs result shows it's 352, therefore reading default LMV failed with -ERANGE (-34). You may delete the default LMV by debugfs: ea_rm <dirname> trusted.dmv.

Comment by Peter Jones [ 02/Dec/23 ]

Mahmoud

Have you tried the procedure outlined above?

Peter

Generated at Sat Feb 10 03:34:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.