[LU-15327] e2fsck Unconnected directory inode, '..' should be <The NULL inode> (0) Created: 07/Dec/21  Updated: 09/Dec/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Prasannakumar Nagasubramani Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

client cluster:Walleye-P5

20 x Lustre 2.15 clients

4 X LNet routers

Lustre client:

2.14.55_81_gc26b347

storage cluster: kjcf05

NEO build: 6.0-65-cm-21.10.24-g2c588a6

Lustre server:

2.14.55_81_gc26b347

Model SSUs

1xE1000D 1xE1000F

4 x OSS nodes with 2 x HDD OSTs and 2 x flash OSTs


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

e2fsck test reports unconnected directory inode.

Unconnected directory inode 520110725 (/REMOTE_PARENT_DIR/0x200017353:0x31d7:0x0/???)Connect to /lost+found? no'..' in ... (520110725) is /REMOTE_PARENT_DIR/0x200017353:0x31d7:0x0 (520110721), should be <The NULL inode> (0).Fix? noUnconnected directory inode 2024701062 (/REMOTE_PARENT_DIR/0x20001927c:0x5672:0x0/???)Connect to /lost+found? no'..' in ... (2024701062) is /REMOTE_PARENT_DIR/0x20001927c:0x5672:0x0 (2024701057), should be <The NULL inode> (0).Fix? noUnconnected directory inode 2696236472 (/REMOTE_PARENT_DIR/0x20001a9e1:0x1:0x0/CL_racer_long_01.7.NE6nhk.1637652210/???)Connect to /lost+found? no'..' in ... (2696236472) is /REMOTE_PARENT_DIR/0x20001a9e1:0x1:0x0/CL_racer_long_01.7.NE6nhk.1637652210 (2696236451), should be <The NULL inode> (0).Fix? no  

grabbed stat and fid2path details:

[root@kjcf05n02 ~]# debugfs -c -R "stat <520110725>" /dev/md65
debugfs 1.45.6.cr1 (14-Aug-2020)
/dev/md65: catastrophic mode - not reading inode or group bitmaps
Inode: 520110725   Type: directory    Mode:  0000   Flags: 0x80000
Generation: 913323086    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 4096
File ACL: 0
Links: 2   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x61986932:39ccc944 -- Fri Nov 19 21:19:14 2021
 atime: 0x00000000:fffffff8 -- Wed Dec 31 18:00:00 1969
 mtime: 0x00000000:fffffff8 -- Wed Dec 31 18:00:00 1969
crtime: 0x61986932:39ccc944 -- Fri Nov 19 21:19:14 2021
Size of extra inode fields: 32
Extended attributes:
  trusted.lma (24) = 00 00 00 00 02 00 00 00 58 93 01 40 02 00 00 00 7a 88 00 00 00 00 00 00
  lma: fid=[0x240019358:0x887a:0x0] compat=0 incompat=2
EXTENTS:
(0):621576195
[root@kjcf05n02 ~]#[root@kjcf05n02 ~]# debugfs -c -R "stat <2024701062>" /dev/md65
debugfs 1.45.6.cr1 (14-Aug-2020)
/dev/md65: catastrophic mode - not reading inode or group bitmaps
Inode: 2024701062   Type: directory    Mode:  0000   Flags: 0x80000
Generation: 1482031855    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 4096
File ACL: 0
Links: 2   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x61984179:dd195ee4 -- Fri Nov 19 18:29:45 2021
 atime: 0x00000000:fffffff8 -- Wed Dec 31 18:00:00 1969
 mtime: 0x00000000:fffffff8 -- Wed Dec 31 18:00:00 1969
crtime: 0x61984179:dd195ee4 -- Fri Nov 19 18:29:45 2021
Size of extra inode fields: 32
Extended attributes:
  trusted.lma (24) = 00 00 00 00 02 00 00 00 d5 92 01 40 02 00 00 00 75 79 01 00 00 00 00 00
  lma: fid=[0x2400192d5:0x17975:0x0] compat=0 incompat=2
EXTENTS:
(0):2418999300
[root@kjcf05n02 ~]#
[root@kjcf05n02 ~]# debugfs -c -R "stat <2696236472>" /dev/md65
debugfs 1.45.6.cr1 (14-Aug-2020)
/dev/md65: catastrophic mode - not reading inode or group bitmaps
Inode: 2696236472   Type: directory    Mode:  0000   Flags: 0x80000
Generation: 3049747120    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 4096
File ACL: 0
Links: 2   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x619c9707:1b0cbab0 -- Tue Nov 23 01:23:51 2021
 atime: 0x00000000:fffffff8 -- Wed Dec 31 18:00:00 1969
 mtime: 0x00000000:fffffff8 -- Wed Dec 31 18:00:00 1969
crtime: 0x619c9707:1b0cbab0 -- Tue Nov 23 01:23:51 2021
Size of extra inode fields: 32
Extended attributes:
  trusted.lma (24) = 00 00 00 00 02 00 00 00 35 c9 01 40 02 00 00 00 b0 1a 00 00 00 00 00 00
  lma: fid=[0x24001c935:0x1ab0:0x0] compat=0 incompat=2
EXTENTS:
(0):3221488812 
[root@kjcf05n02 ~]#walleye-p5:/home/users/talbers/bin # lfs fid2path /lus/kjcf05 0x200017353:0x31d7:0x0
/lus/kjcf05/flash/ostest.vers/alsorun.20211119101703.3734.walleye-p5/CL_racer_long_01.8.muPgaK.1637378259/
walleye-p5:/home/users/talbers/bin # lfs fid2path /lus/kjcf05 0x20001927c:0x5672:0x0
/lus/kjcf05/disk/ostest.vers/alsorun.20211119101703.3734.walleye-p5/CL_racer_long_01.7.QPfrFU.1637368134
walleye-p5:/home/users/talbers/bin # lfs fid2path /lus/kjcf05 0x20001a9e1:0x1:0x0
/lus/kjcf05/disk/ostest.vers/alsorun.20211123004402.2426.walleye-p5
walleye-p5:/home/users/talbers/bin # 

I will upload the e2image and e2fsck logs to the ftp.



 Comments   
Comment by Prasannakumar Nagasubramani [ 07/Dec/21 ]

Logs are available at FTP.

ftp> pwd
257 "/uploads/LU15327" 
Comment by Andreas Dilger [ 07/Dec/21 ]

This seems similar to LU-14168, but doesn't describe how the errors were introduced to the filesystem, or what version of e2fsck was used.

The image is less interesting than having a specific (and short) process to recreate this issue.

Comment by Prasannakumar Nagasubramani [ 09/Dec/21 ]

FYI... IO command. I ran lustre FOFB (stress) test prior to this.

cmdline="ubrun -s "We Survived" -t -D -e APPS_CL -o -x -T CL_racer_long_01 -a $A
PPS_CL/filesystem/racer/RUN aptrun -A bypasstrans=1 -n 2 -M 8 APPS_CL=/filesyste
m/racer/RUN/racer.sh -t 600 -d $TMPDIR/CL_racer_long_01 -T 7 -f 20 -c" 
Comment by Andreas Dilger [ 09/Dec/21 ]

Unfortunately, racer is doing "a lot of random things", so it isn't necessarily clear what it was doing to cause this.

Generated at Sat Feb 10 03:17:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.