[LU-11549] Unattached inodes after 3 min racer run. Created: 19/Oct/18  Updated: 12/Mar/20  Resolved: 11/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: None

Attachments: File racer-mod.tar.gz     File unconnected.txt.gz    
Issue Links:
Related
is related to LU-11446 ldiskfs inodes nlink mismatch with DNE Open
is related to LU-11130 cross-target rename creates invalid s... Resolved
is related to LU-3537 allow cross-MDT for all metadata oper... Resolved
is related to LU-12848 Add test case for LU-11549 Resolved
is related to LU-13346 Fix link and rename race on zfs odb Resolved
is related to LU-11706 create a lustre tunable to enable/dis... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

An attempt to run racer.sh on a DNE system with rpms built from wc master branch:

[root@cslmodev100 racer]# sh racer.sh  -t 180 -T 7 -f 20 -c -d /mnt/testfs/racer-dir/
Directory:     /mnt/testfs/racer-dir/
Time Limit:    180
Lustre Tests:  1
Max Files:     20
Threads:       7
Running Tests: lustre_file_create dir_create file_rm file_rename file_link file_symlink file_list file_concat file_exec dir_remote
MDS Count:     3
Running racer.sh for 180 seconds. CTRL-C to exit
file_create: FILE=/mnt/testfs/racer-dir//10 SIZE=207136
file_create: FILE=/mnt/testfs/racer-dir//17 SIZE=4800
file_create: FILE=/mnt/testfs/racer-dir//5 SIZE=73416
file_create: FILE=/mnt/testfs/racer-dir//19 SIZE=116024
...
file_create: FILE=/mnt/testfs/racer-dir//0 SIZE=234400
file_create: FILE=/mnt/testfs/racer-dir//2 SIZE=136432
file_create: FILE=/mnt/testfs/racer-dir//8 SIZE=53296
file_create: FILE=/mnt/testfs/racer-dir//12 SIZE=233528
racer cleanup
sleeping 5 sec ...
lustre_file_create.sh: no process found
dir_create.sh: no process found
file_rm.sh: no process found
file_rename.sh: no process found
file_link.sh: no process found
file_symlink.sh: no process found
file_list.sh: no process found
file_concat.sh: no process found
file_exec.sh: no process found
dir_remote.sh: no process found
there should be NO racer processes:
root     201964  0.0  0.0 112660   988 pts/24   S+   11:03   0:00 grep -E lustre_file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec|dir_remote
Filesystem                                   1K-blocks    Used    Available Use% Mounted on
172.18.1.3@o2ib1,172.18.1.4@o2ib1:/testfs 240559470792 4289196 238129624412   1% /mnt/testfs
We survived racer.sh for 180 seconds.

 
e2fsck on the MDT0 device:

[root@cslmodev103 ~]# umount /dev/md66
[root@cslmodev103 ~]# e2fsck -fvn /dev/md66
e2fsck 1.42.13.x6 (01-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 2061541938 ref count is 3, should be 2.  Fix? no

Unattached inode 2061541970
Connect to /lost+found? no

Inode 2061541986 ref count is 2, should be 1.  Fix? no

Unattached inode 2061542181
Connect to /lost+found? no

Inode 2061542575 ref count is 10, should be 9.  Fix? no

Inode 2061542583 ref count is 6, should be 5.  Fix? no

Pass 5: Checking group summary information
[QUOTA WARNING] Usage inconsistent for ID 0:actual (1248931840, 1295) != expected (1248919552, 1295)
Update quota info for quota type 0? no

[QUOTA WARNING] Usage inconsistent for ID 0:actual (1248931840, 1295) != expected (1248919552, 1295)
Update quota info for quota type 1? no


testfs-MDT0000: ********** WARNING: Filesystem still has errors **********


        1304 inodes used (0.00%, out of 3042005760)
          85 non-contiguous files (6.5%)
           2 non-contiguous directories (0.2%)
             # of inodes with ind/dind/tind blocks: 72/64/0
   381833025 blocks used (25.10%, out of 1520996090)
           0 bad blocks
           2 large files

         555 regular files
         491 directories
           0 character device files
           0 block device files
           0 fifos
         107 links
         249 symbolic links (249 fast symbolic links)
           0 sockets
------------
        1400 files
[root@cslmodev103 ~]#

[root@cslmodev103 ~]# dumpe2fs -h /dev/md66 | grep -i state
dumpe2fs 1.42.13.x6 (01-Mar-2018)
Filesystem state:         clean
[root@cslmodev103 ~]#

Invalid symlink inodes are due to LU-11130, wrong nlinks are due to LU-11446,
unattached inodes are what this ticket is about.

The racer test script doesn't use migrate or striped dirs,
just "lustre_file_create dir_create file_rm file_rename file_link file_symlink file_list file_concat file_exec dir_remote". Also there is no failovers.

Lustre is built from the tip of the wc master branch:

$ git log --oneline wc/master
fe7c13bd48 (wc/master) LU-11329 utils: create tests maintainers list
70a01a6c9c LU-11276 ldlm: don't apply ELC to converting and DOM locks
72372486a5 LU-11347 osd: do not use pagecache for I/O
8b9105d828 LU-11199 mdt: Attempt lookup lock on open
697e8fe6f3 LU-11473 doc: add lfs-getsom man page
ed0c19d250 LU-1095 misc: quiet console messages at startup
....
[root@cslmodev103 ~]# rpm -q lustre_ib
lustre_ib-2.11.56_16_gfe7c13b-1.el7.centos.x86_64
[root@cslmodev103 ~]#


 Comments   
Comment by Peter Jones [ 25/Oct/18 ]

Zam

Have you confirmed whether the situation improves with your proposed fix for LU-11130?

Peter

Comment by Cory Spitz [ 25/Oct/18 ]

The bug still exists with patch from LU-11130.

However, we've found that the problem is no longer reproducible after disabling the cross-MDT capability provided with LU-3537 (but leaving the async update capability from LU-3534).

Comment by Alexander Zarochentsev [ 13/Nov/18 ]

Peter,
>Have you confirmed whether the situation improves with your proposed fix for LU-11130

no, the unattached inodes have nothing to do with LU-11130. We have LU-11130 landed for some time already, it doesn't have any effect on this bug.

This issue is about regular files , racer.sh test (attached) without stiped dirs or migrate, no failovers, no e2fsck runs expept the final e2fsck run after racer.sh completes. Actually it was harder to reproduce the issue on master b/c it just hangs .

here the results of 10 min racer run:

E2fsck after racer. Well, it is mounted this time but nobody accesses the fs for about 30 sec since racer completes. Believe me the corruption exists when the fs is cleanly unmounted.

e2fsck 1.42.13.x6 (01-Mar-2018)
device /dev/md66 mounted by lustre
Warning!  /dev/md66 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 2061541921 ref count is 3, should be 2.  Fix? no

Unattached inode 2061541929
Connect to /lost+found? no
...

the inode stat:

[root@cslmodev103 ~]# debugfs -c -R "stat <2061541929>" /dev/md66
debugfs 1.42.13.x6 (01-Mar-2018)
/dev/md66: catastrophic mode - not reading inode or group bitmaps
Inode: 2061541929   Type: regular    Mode:  0644   Flags: 0x0
Generation: 2673176195    Version: 0x00000024:007990ae
User:     0   Group:     0   Project:     0   Size: 0
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x5bd09024:00000000 -- Wed Oct 24 10:30:44 2018
atime: 0x5bd09024:00000000 -- Wed Oct 24 10:30:44 2018
mtime: 0x5bd09024:00000000 -- Wed Oct 24 10:30:44 2018
crtime: 0x5bd0902b:51e10c54 -- Wed Oct 24 10:30:51 2018
Size of extra inode fields: 32
Extended attributes stored in inode body:
  lma = "00 00 00 00 04 00 00 00 0f 50 01 00 02 00 00 00 54 58 00 00 00 00 00 00 " (24)
  lma: fid=[0x20001500f:0x5854:0x0] compat=0 incompat=4

  lov = "d0 0b d1 0b 01 00 00 00 54 58 00 00 00 00 00 00 0f 50 01 00 02 00 00 00 00 00 10 00 01 00 00 00 b1 b7 dc 06 01 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 01 00 00 00 " (56)
  selinux = "unlabeled\000" (10)
  link = "df f1 ea 11 02 00 00 00 3e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 13 00 00 00 02 00 01 50 0b 00 01 85 f4 00 00 00 00 38 00 13 00 0
0 00 02 c0 01 05 d3 00 00 79 c7 00 00 00 00 38 " (62)
BLOCKS:

[root@cslmodev103 ~]#

FIDs in Link EA decoded:

fid1 = [0x20001500b:0x185f4:0x0]
fid2 = [0x2c00105d3:0x79c7:0x0]

Those fids found on the lustre fs:

[root@cslmodev100 ~]# lfs path2fid /mnt/testfs/racer-dir
[0x20001500b:0x185f4:0x0]
[root@cslmodev100 ~]# lfs path2fid /mnt/testfs/racer-dir/18/11/3/8/14/0/12/19/0/6/2/9/13/12/7/9/15/0/3/10/6/5/4/6/1/4/9/10/7/19/12/10/7/3/18/15/19/8/16/5/10/18/1/6/11/5/3/1/5/11/13/9/10/6/9/15/11/17/11/5/1/0/2/6/4/3/15/8/16/17/0/13/1/14/10/18/7/10/15/2/16/19/10/7/13/4/7/8/8/8/18/15/7/3/1/0/8/10/3/15/2/5/19/10/6/18/12/1/11/9/4/2/1/9/8/11/17/14/8/15/17/9/11/8/10/17/10/9/7/0/16/2/14/4/10/1/17/3/15/5/10/14/17/6/7/19/18/14/16/6/3/1/6/10/5/9/11/0/13/1/11/13/5/9/6/10/14/2/8/13/17/13/10/5/2/4/16/3/10/12/14/11/19/8/15/12/1/7/2/7/13
[0x2c00105d3:0x79c7:0x0]
[root@cslmodev100 ~]#

Moreover, there is "8" file name in the second dir, but it doesn't allow file to be accessed:

[root@cslmodev100 ~]# ls -l /mnt/testfs/racer-dir/18/11/3/8/14/0/12/19/0/6/2/9/13/12/7/9/15/0/3/10/6/5/4/6/1/4/9/10/7/19/12/10/7/3/18/15/19/8/16/5/10/18/1/6/11/5/3/1/5/11/13/9/10/6/9/15/11/17/11/5/1/0/2/6/4/3/15/8/16/17/0/13/1/14/10/18/7/10/15/2/16/19/10/7/13/4/7/8/8/8/18/15/7/3/1/0/8/10/3/15/2/5/19/10/6/18/12/1/11/9/4/2/1/9/8/11/17/14/8/15/17/9/11/8/10/17/10/9/7/0/16/2/14/4/10/1/17/3/15/5/10/14/17/6/7/19/18/14/16/6/3/1/6/10/5/9/11/0/13/1/11/13/5/9/6/10/14/2/8/13/17/13/10/5/2/4/16/3/10/12/14/11/19/8/15/12/1/7/2/7/13/8
ls: cannot access /mnt/testfs/racer-dir/18/11/3/8/14/0/12/19/0/6/2/9/13/12/7/9/15/0/3/10/6/5/4/6/1/4/9/10/7/19/12/10/7/3/18/15/19/8/16/5/10/18/1/6/11/5/3/1/5/11/13/9/10/6/9/15/11/17/11/5/1/0/2/6/4/3/15/8/16/17/0/13/1/14/10/18/7/10/15/2/16/19/10/7/13/4/7/8/8/8/18/15/7/3/1/0/8/10/3/15/2/5/19/10/6/18/12/1/11/9/4/2/1/9/8/11/17/14/8/15/17/9/11/8/10/17/10/9/7/0/16/2/14/4/10/1/17/3/15/5/10/14/17/6/7/19/18/14/16/6/3/1/6/10/5/9/11/0/13/1/11/13/5/9/6/10/14/2/8/13/17/13/10/5/2/4/16/3/10/12/14/11/19/8/15/12/1/7/2/7/13/8: No such file or directory
[root@cslmodev100 ~]#

in further investigation I found that fid lookup for [0x20001500f:0x5854:0x0] fails
(return code 0 means the fid not found):

00000004:00000040:14.0:1540652178.108296:0:90739:0:(mdt_handler.c:5441:mdt_object_init()) object init, fid = [0x20001500f:0x5854:0x0]
00000004:00000010:14.0:1540652178.108298:0:90739:0:(mdd_object.c:282:mdd_object_alloc()) slab-alloced 'mdd_obj': 96 at ffff880f83b741e0.
...
00080000:00000001:14.0:1540652178.108324:0:90739:0:(osd_oi.c:543:osd_oi_iam_lookup()) Process entered
00000001:00000010:14.0:1540652178.108338:0:90739:0:(osd_dynlocks.c:109:dynlock_lock()) slab-alloced 'nhl': 72 at ffff88101b532ee8.
00000001:00000010:14.0:1540652178.108342:0:90739:0:(osd_dynlocks.c:193:dynlock_unlock()) slab-freed 'hl': 72 at ffff88101b532ee8.
00080000:00000001:14.0:1540652178.108344:0:90739:0:(osd_oi.c:565:osd_oi_iam_lookup()) Process leaving (rc=0 : 0 : 0)

Conclusions:

  1. the unattached inode on MDT0 actually has one link from MDT1 and it has a valid nlink counter value of 1
  2. the link from MDT1 doesn't work b/c FID lookup finds no inode on MDT0.
  3. the inode has no name on MDT0 which is a format violation, at least it should be connected to /REMOTE_PARENT_DIR/
  4. the OI on MDT0 has no record for [0x20001500f:0x5854:0x0] (the unattached inode FID)
  5. one of Link EA records is valid (corresponding forward link exists) another one is not vaild (there is no link "racer-dir/8").
Comment by Alexander Zarochentsev [ 13/Nov/18 ]

More info from another run where I managed to get lustre logs with almost full debug (trace , inode, info, other) and I added debug code to OUT code :

the unattached inode FID is 0x20001ec49:0xab67:0x0 now. Below is a grep for 0x20001ec49:0xab67:0x0 through the lustre log:

00000004:00000002:16.0:1541058276.560428:0:61806:0:(mdt_reint.c:72:mdt_obj_version_get()) FID [0x20001ec49:0xab67:0x0] version is 0x37000634a7
00080000:00000002:16.0:1541058276.560531:0:61806:0:(osd_handler.c:312:osd_idc_find_and_init()) testfs-MDT0000: FID [0x20001ec49:0xab67:0x0] not in the id map cache
00080000:00000002:16.0:1541058276.560677:0:61806:0:(osd_handler.c:3971:osd_ref_del()) [0x20001ec49:0xab67:0x0] decrease nlink 1
00080000:00000002:16.0:1541058276.560691:0:61806:0:(osd_handler.c:5864:osd_index_ea_insert()) parent 1985033217 insert 0x20001ec49:0xab67:0x0:2061542000 rc = 0
00080000:00000002:16.0:1541058276.560694:0:61806:0:(osd_handler.c:3882:osd_ref_add()) [0x20001ec49:0xab67:0x0] increase nlink 0
00000004:00000002:16.0:1541058276.560734:0:61806:0:(mdt_handler.c:736:mdt_pack_attr2body()) [0x20001ec49:0xab67:0x0]: nlink=1, mode=100644, valid=0x100000000002f8f
00010000:00000040:34.0:1541058276.561254:0:26965:0:(ldlm_resource.c:1701:ldlm_resource_dump()) --- Resource: [0x20001ec49:0xab67:0x0].0x0 (ffff8810196f4f00) refcount = 1
00000020:00001000:6.0:1541058276.561752:0:61820:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 0, opc = 6 [0x20001ec49:0xab67:0x0]
00010000:00000040:13.0:1541058276.570670:0:26966:0:(ldlm_resource.c:1701:ldlm_resource_dump()) --- Resource: [0x20001ec49:0xab67:0x0].0x0 (ffff8810196f4f00) refcount = 2
00000020:00001000:6.0:1541058276.571050:0:61820:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 0, opc = 8 [0x20001ec49:0xab67:0x0]
00000020:00000040:6.0:1541058276.571056:0:61820:0:(out_handler.c:295:out_xattr_get()) testfs-MDT0000: [0x20001ec49:0xab67:0x0] get xattr trusted.link len 43: rc = 43
00000020:00001000:6.0:1541058276.571722:0:61820:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 0, opc = 3 [0x20001ec49:0xab67:0x0]
00080000:00000002:6.0:1541058276.571726:0:61820:0:(osd_handler.c:312:osd_idc_find_and_init()) testfs-MDT0000: FID [0x20001ec49:0xab67:0x0] not in the id map cache
00000020:00001000:6.0:1541058276.571728:0:61820:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 1, opc = 7 [0x20001ec49:0xab67:0x0]
00080000:00000002:6.0:1541058276.571774:0:61820:0:(osd_handler.c:3882:osd_ref_add()) [0x20001ec49:0xab67:0x0] increase nlink 1
00080000:00000002:6.0:1541058276.571782:0:61820:0:(osd_handler.c:4382:osd_xattr_set()) [0x20001ec49:0xab67:0x0] set xattr 'trusted.link' with size 62
00080000:00000002:6.0:1541058276.571791:0:61820:0:(osd_compat.c:264:osd_add_to_remote_parent()) md66: add agent entry [0x20001ec49:0xab67:0x0]
00080000:00000002:6.0:1541058276.571799:0:61820:0:(osd_compat.c:296:osd_add_to_remote_parent()) testfs-MDT0000: create agent entry for 0x20001ec49:0xab67:0x0: rc = 0
00000004:00000002:13.0:1541058276.615091:0:61817:0:(mdt_open.c:2214:mdt_mfd_close()) testfs-MDT0000: close file [0x20001ec49:0xab67:0x0] with intent: 0
00000004:00000002:13.0:1541058276.615095:0:61817:0:(mdd_object.c:1155:mdd_attr_set()) testfs-MDD0000: no valid attribute on [0x20001ec49:0xab67:0x0], previousvalid is 0x1
00080000:00000002:13.0:1541058276.615139:0:61817:0:(osd_handler.c:312:osd_idc_find_and_init()) testfs-MDT0000: FID [0x20001ec49:0xab67:0x0] not in the id map cache
00080000:00000002:13.0:1541058276.615239:0:61817:0:(osd_handler.c:3971:osd_ref_del()) [0x20001ec49:0xab67:0x0] decrease nlink 2
00080000:00000002:13.0:1541058276.615246:0:61817:0:(osd_compat.c:318:osd_delete_from_remote_parent()) md66: remove agent direntry [0x20001ec49:0xab67:0x0]
00080000:00000002:13.0:1541058276.615253:0:61817:0:(osd_compat.c:340:osd_delete_from_remote_parent()) testfs-MDT0000: remove agent entry for 0x20001ec49:0xab67:0x0: rc = 0
00080000:00000002:13.0:1541058276.615255:0:61817:0:(osd_oi.c:817:osd_oi_delete()) md66: delete [0x20001ec49:0xab67:0x0]
00010000:00000040:13.0:1541058276.615329:0:61817:0:(ldlm_resource.c:1701:ldlm_resource_dump()) --- Resource: [0x20001ec49:0xab67:0x0].0x0 (ffff8810196f4f00) refcount = 2
00010000:00000040:13.0:1541058276.615332:0:61817:0:(ldlm_resource.c:1707:ldlm_resource_dump()) ### ### ns: mdt-testfs-MDT0000_UUID lock: ffff88078a251680/0x94311686d2f0ef67 lrc: 2/0,0 mode: EX/EX res: [0x20001ec49:0xab67:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x40000001000000 nid: 172.18.1.8@o2ib1 remote: 0x54fc01ada1ff65c1 expref: 9 pid: 61820 timeout: 0 lvb_type: 0
00000004:00000040:13.0:1541058276.615373:0:61817:0:(mdt_handler.c:5461:mdt_object_free()) object free, fid = [0x20001ec49:0xab67:0x0]

For a moment, nlink counter dropped to zero:

00080000:00000002:16.0:1541058276.560694:0:61806:0:(osd_handler.c:3882:osd_ref_add()) [0x20001ec49:0xab67:0x0] increase nlink 0

I think it is not a valid situation, VFS warns about such cases, but Lustre avoids nlinks increase from zero (LU-4416 http://review.whamcloud.com/11571 ).

almost at the same time thread 61817 closes the file, it sees nlink = 0 (my guess) and continue with osd_delete_from_remote_parent() and osd_oi_delete() , it causes the corruption explained in the previous comment.

Comment by Alexander Zarochentsev [ 14/Nov/18 ]

Attaching unconnected.txt.gz – lustre log from MDT0 with dropping nlink to zero and destroying the file.

Comment by Alexander Zarochentsev [ 17/Dec/18 ]

more debugging showed the following scenario of the fs corruption:

1. client opens file [0x20001500b:0x185f4:0x0]/9 , FID=[0x200020b8a:0x1c3d:0x0] and keeps fd open:

00000004:00000002:10.0:1544282653.825766:0:105088:0:(mdt_open.c:1319:mdt_reint_open()) I am going to open [0x20001500b:0x185f4:0x0]/(9->[0x200020b8a:0x1c3d:0x0]) cr_flag=02102 mode=0100
666 msg_flag=0x0

2. rename [0x20001500b:0x185f4:0x0]/4 to [0x20001500b:0x185f4:0x0]/9 unlinks target file:

0000004:00000002:16.0:1544282653.939918:0:105058:0:(mdt_reint.c:1880:mdt_reint_rename_internal()) @@@ rename [0x20001500b:0x185f4:0x0]/4 to [0x20001500b:0x185f4:0x0]/9  req@ffff880cf46

00000004:00000001:16.0:1544282653.942415:0:105058:0:(mdd_dir.c:3039:mdd_rename()) Process entered

and puts the object into the orpan list b/c the file is still open:

00000004:00080000:16.0:1544282653.942707:0:105058:0:(mdd_dir.c:1663:mdd_finish_unlink()) Object [0x200020b8a:0x1c3d:0x0] is inserted into orphan list, open count = 1

3. somehow in parallel with (2), a cross-mdt transaction adds a reference to [0x200020b8a:0x1c3d:0x0] (increases nlink count):

00080000:00000002:17.0:1544282653.996839:0:105655:0:(osd_handler.c:3882:osd_ref_add()) [0x200020b8a:0x1c3d:0x0] increase nlink 1

it is a part of compound "out" update :

00000020:00001000:17.0:1544282653.996777:0:105655:0:(out_handler.c:997:out_handle()) testfs-MDT0000: out update request, 1 buffers
...
00000020:00001000:17.0:1544282653.996782:0:105655:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 0, opc = 3 [0x200020b8a:0x1c3d:0x0]
...
00000020:00001000:17.0:1544282653.996789:0:105655:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 1, opc = 7 [0x200020b8a:0x1c3d:0x0]
...
00000020:00001000:17.0:1544282653.996792:0:105655:0:(out_handler.c:1085:out_handle()) ... testfs-MDT0000: buf 0, update 2, opc = 12 [0x20001fbeb:0x4:0x0]
...

I think the cross-target update is a link() operation, creating a file name on another MDT.

4. finally, file close removes the object from orphan list and from OI (b/c it has ORPHAN_OBJ and DEAD_OBJ flags set),  creating a fs corruption reported by e2fsck as "unattached inode".

I do not know what exactly is broken yet, suspecting cross-target link not checking properly for ORPAN_OBJ | DEAD_OBJ flags.

 

Comment by Gerrit Updater [ 31/May/19 ]

Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35021
Subject: LU-11549 mdt: take UPDATE lock for rename target
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4ff2700af396e63fd9ac90d5f51f1129465c8df7

Comment by Colin Faber [X] (Inactive) [ 25/Jun/19 ]

Peter Jones - any chance we can get someone to review this?

Comment by Alexander Zarochentsev [ 20/Jul/19 ]

Root cause update:

0. Locks are OK.

1. the mdt_reint_link operation over a remote object cannot perform a reliable check for the remote object, We want to check the mdd object for DEAD_OBJ flag but can't do that, because DEAD_OBJ is an mdd object flag, it exists on the remote MDT and not seen through OSP/OUT interface where out_attr_get() checks only dt object state. So the mdd_link_sanity_check() is passed successfully and illegal link operation is done.

2. A solution could be in moving DEAD_OBJ flag to lu_object level .

Comment by Gerrit Updater [ 02/Aug/19 ]

Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35683
Subject: LU-11549 tests: link succeded to an ophan remote object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 14ab193c6aa78ca95b85fa5c940af62fbb815abb

Comment by Alexander Zarochentsev [ 02/Aug/19 ]

Reproducer: https://review.whamcloud.com/35683 :

== sanityn test 104: Link to an orphan remote object should not succeed ============================== 22:38:41 (1564774721)
fail_loc=0x8000018a
sanityn.sh: line 4738: 23639 Terminated              $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c
 sanityn test_104: @@@@@@ FAIL: Link to an orphan object succeeded 
  Trace dump:
  = ./../tests/test-framework.sh:6114:error()
  = sanityn.sh:4739:test_104()
  = ./../tests/test-framework.sh:6416:run_one()
  = ./../tests/test-framework.sh:6455:run_one_logged()
  = ./../tests/test-framework.sh:6286:run_test()
  = sanityn.sh:4741:main()
Dumping lctl log to /tmp/test_logs/1564774698/sanityn.test_104.*.1564774724.log
Dumping logs only on local client.
Resetting fail_loc on all nodes...done.
FAIL 104 (3s)
cleanup: ======================================================
sh: fuser: command not found
sh: fuser: command not found
== sanityn test complete, duration 26 sec ============================================================ 22:38:44 (1564774724)
sanityn: FAIL: test_104 Link to an orphan object succeeded
rm: cannot remove '/mnt/lustre/d104.sanityn/mdt1dir': Directory not empty

the wrong link causes a cross-mdt link corruption,

[root@devvm1 tests]# stat /mnt/lustre/d104.sanityn/mdt1dir/file2x 
stat: cannot stat ‘/mnt/lustre/d104.sanityn/mdt1dir/file2x’: No such file or directory
[root@devvm1 tests]# 

file2x is a name which cannot be deleted, it points to a remote object already removed from OI, but its inode (unconnected) still exists.

Comment by Gerrit Updater [ 12/Aug/19 ]

Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35776
Subject: LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fd759510de938a3556eddf3b5798ce95135e1dbc

Comment by Alexander Zarochentsev [ 12/Aug/19 ]

The above patch https://review.whamcloud.com/35776 fixes the issue, my reproducer doesn't show any fs corruption . Unfortunately it reports an incorrect link creation even with the fix b/c the test was not ready that client repeats the link() operation after getting ESTALE error, so second attempt to create the link works.

the fix is one-liner:

diff --git a/lustre/mdd/mdd_dir.c b/lustre/mdd/mdd_dir.c
index 17613d6..260dd7a 100644
--- a/lustre/mdd/mdd_dir.c
+++ b/lustre/mdd/mdd_dir.c
@@ -1456,7 +1456,7 @@ static int mdd_mark_orphan_object(const struct lu_env *env,
        struct lu_attr *attr = MDD_ENV_VAR(env, la_for_start);
        int rc;
 
-       if (!S_ISDIR(mdd_object_type(obj)))
+       if (S_ISDIR(mdd_object_type(obj)))
                return 0;
 
        attr->la_valid = LA_FLAGS;
Comment by Gerrit Updater [ 29/Aug/19 ]

Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35991
Subject: LU-11549 tests: link succeded to an ophan remote object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3e6574997f82407c98b90318448400bb3a1ca5e0

Comment by Gerrit Updater [ 07/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35776/
Subject: LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8d27c92a66d63aaf8b8fbe1fc73e49263b5bed1e

Comment by Andreas Dilger [ 11/Oct/19 ]

The fix for this problem has been landed for 2.13.0 so this ticket should be closed so that it can be tracked properly for the release. Since the test patch is not currently passing testing I've opened a separate ticket to track that landing.

Generated at Sat Feb 10 02:44:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.