[LU-11130] cross-target rename creates invalid symlink inodes Created: 09/Jul/18  Updated: 16/Jan/20  Resolved: 13/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0, Lustre 2.10.7

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: dne

Issue Links:
Related
is related to LU-11631 symlink migration should not create r... Resolved
is related to LU-11549 Unattached inodes after 3 min racer run. Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

cross-rename for symlinks creates an empty local agent inodes with i_size = 0 . e2fsck complains about them :

Symlink /REMOTE_PARENT_DIR/0x30004e816:0x17942:0x0/12 (inode #97469960) is invalid.
Clear? no

The issue can be easily reproduced:

1. start DNE system:

[root@vm1 tests]# MDSCOUNT=4 REFORMAT=no sh llmount.sh
Stopping clients: vm1.localdomain /mnt/lustre (opts:-f)
Stopping clients: vm1.localdomain /mnt/lustre2 (opts:-f)
Loading modules from /home/zam/git/lustre-wc-rel/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
gss/krb5 is not supported
quota/lquota options: 'hash_lqs_cur_bits=3'
Formatting mgs, mds, osts
Format mds1: /tmp/lustre-mdt1
Format mds2: /tmp/lustre-mdt2
Format mds3: /tmp/lustre-mdt3
Format mds4: /tmp/lustre-mdt4
Format ost1: /tmp/lustre-ost1
Format ost2: /tmp/lustre-ost2
Checking servers environments
Checking clients vm1.localdomain environments
Loading modules from /home/zam/git/lustre-wc-rel/lustre/tests/..
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
gss/krb5 is not supported
Setup mgs, mdt, osts
Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
Commit the device label on /tmp/lustre-mdt1
Started lustre-MDT0000
Starting mds2:   -o loop /tmp/lustre-mdt2 /mnt/lustre-mds2
Commit the device label on /tmp/lustre-mdt2
Started lustre-MDT0001
Starting mds3:   -o loop /tmp/lustre-mdt3 /mnt/lustre-mds3
Commit the device label on /tmp/lustre-mdt3
Started lustre-MDT0002
Starting mds4:   -o loop /tmp/lustre-mdt4 /mnt/lustre-mds4
Commit the device label on /tmp/lustre-mdt4
Started lustre-MDT0003
Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/lustre-ost1
Commit the device label on /tmp/lustre-ost1
Started lustre-OST0000
Starting ost2:   -o loop /tmp/lustre-ost2 /mnt/lustre-ost2
Commit the device label on /tmp/lustre-ost2
Started lustre-OST0001
Starting client: vm1.localdomain:  -o user_xattr,flock vm1.localdomain@tcp:/lustre /mnt/lustre
UUID                   1K-blocks        Used   Available Use% Mounted on
lustre-MDT0000_UUID       125368        1956      112176   2% /mnt/lustre[MDT:0]
lustre-MDT0001_UUID       125368        1760      112372   2% /mnt/lustre[MDT:1]
lustre-MDT0002_UUID       125368        1764      112368   2% /mnt/lustre[MDT:2]
lustre-MDT0003_UUID       125368        1768      112364   2% /mnt/lustre[MDT:3]
lustre-OST0000_UUID       325368       13924      284284   5% /mnt/lustre[OST:0]
lustre-OST0001_UUID       325368       13380      284828   4% /mnt/lustre[OST:1]

filesystem_summary:       650736       27304      569112   5% /mnt/lustre

Using TIMEOUT=20
seting jobstats to procname_uid
Setting lustre.sys.jobid_var from disable to procname_uid
Waiting 90 secs for update
Updated after 4s: wanted 'procname_uid' got 'procname_uid'
disable quota as required

2. create directories on other MDTs:

[root@vm1 tests]# for x in 1 2 3; do lfs mkdir -i $x /mnt/lustre/mdt$x-dir; done

3. create a symlink on MDT0:

[root@vm1 tests]# ln -s "foo" /mnt/lustre/bar-symlink

4. move the symlink to mdt1:

[root@vm1 tests]# mv /mnt/lustre/foo /mnt/lustre/mdt1-dir/
mv: cannot stat ‘/mnt/lustre/foo’: No such file or directory
[root@vm1 tests]# mv /mnt/lustre/bar-symlink /mnt/lustre/mdt1-dir/

5. check that the fs images are updated with MDT objects. Please note there are two mdt objects for "bar-symlink" , on on MDT0 and one on MDT1 . Both objects are of symlink type, but only one (on MDT0) has symlink body and Link EA.

[root@vm1 tests]# sync
[root@vm1 tests]# debugfs /tmp/lustre-mdt2
debugfs 1.42.13.wc6 (05-Feb-2017)
debugfs:  ls REMOTE_PARENT_DIR
 25001  (12) .    2  (12) ..    25039  (4072) 0x240000404:0x1:0x0
debugfs:  ls <25039>
 25039  (12) .    25001  (28) ..    149  (4056) bar-symlink
debugfs:  stat <149>
Inode: 149   Type: symlink    Mode:  0000   Flags: 0x0
Generation: 1006356438    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 0
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5b3aad37:85bc158c -- Tue Jul  3 01:54:47 2018
 atime: 0x5b3aad37:85bc158c -- Tue Jul  3 01:54:47 2018
 mtime: 0x5b3aad37:85bc158c -- Tue Jul  3 01:54:47 2018
crtime: 0x5b3aad37:85bc158c -- Tue Jul  3 01:54:47 2018
Size of extra inode fields: 32
Extended attributes stored in inode body:
  lma = "00 00 00 00 02 00 00 00 04 04 00 00 02 00 00 00 01 00 00 00 00 00 00 00 " (24)
  lma: fid=[0x200000404:0x1:0x0] compat=0 incompat=2
Fast_link_dest:
debugfs:  [root@vm1 tests]# debugfs /tmp/lustre-mdt1
debugfs 1.42.13.wc6 (05-Feb-2017)
debugfs:  ls ROOT
 25043  (12) .    2  (12) ..    25044  (36) .lustre    25049  (36) mdt1-dir
 25050  (36) mdt2-dir    25051  (3964) mdt3-dir
debugfs:  ls REMOTE_PARENT_DIR
 25001  (12) .    2  (12) ..    165  (4072) 0x200000404:0x1:0x0
debugfs:  ls <165>

<165>: Ext2 inode is not a directory
debugfs:  stat <165>
Inode: 165   Type: symlink    Mode:  0777   Flags: 0x0
Generation: 2421202347    Version: 0x00000001:00000010
User:     0   Group:     0   Project:     0   Size: 3
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5b3aad37:00000000 -- Tue Jul  3 01:54:47 2018
 atime: 0x5b3aad21:4f6ef7f0 -- Tue Jul  3 01:54:25 2018
 mtime: 0x5b3aad21:4f6ef7f0 -- Tue Jul  3 01:54:25 2018
crtime: 0x5b3aad21:4f6ef7f0 -- Tue Jul  3 01:54:25 2018
Size of extra inode fields: 32
Extended attributes stored in inode body:
  lma = "00 00 00 00 04 00 00 00 04 04 00 00 02 00 00 00 01 00 00 00 00 00 00 00 " (24)
  lma: fid=[0x200000404:0x1:0x0] compat=0 incompat=4
  selinux = "unconfined_u:object_r:unlabeled_t:s0\000" (37)
  link = "df f1 ea 11 01 00 00 00 35 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1d 00 00 00 02 40 00 04 04 00 00 00 01 00 00 00 00 62 61 72 2d 73 79 6d
 6c 69 6e 6b " (53)
Fast_link_dest: foo
debugfs:  

e2fsck on the images. Fsck complains about invalid symlink object on MDT1 (which does not contain symlink body).

[root@vm1 tests]# e2fsck -fn /tmp/lustre-mdt1
e2fsck 1.42.13.wc6 (05-Feb-2017)
Warning!  /tmp/lustre-mdt1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (33293, counted=32838).
Fix? no

Free inodes count wrong (99987, counted=99718).
Fix? no

lustre-MDT0000: 13/100000 files (46.2% non-contiguous), 29207/62500 blocks
[root@vm1 tests]# e2fsck -fn /tmp/lustre-mdt2
e2fsck 1.42.13.wc6 (05-Feb-2017)
Warning!  /tmp/lustre-mdt2 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /REMOTE_PARENT_DIR/0x240000404:0x1:0x0/bar-symlink (inode #149) is invalid.
Clear? no

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (33293, counted=32924).
Fix? no

Free inodes count wrong (99987, counted=99745).
Fix? no


lustre-MDT0001: ********** WARNING: Filesystem still has errors **********

lustre-MDT0001: 13/100000 files (7.7% non-contiguous), 29207/62500 blocks
[root@vm1 tests]#


 Comments   
Comment by Gerrit Updater [ 09/Jul/18 ]

Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/32797
Subject: LU-11130 osd-ldiskfs: create non-empty local agent symlinks
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ed36f34c458df066326ef5c6b735174233441365

Comment by Andreas Dilger [ 06/Nov/18 ]

I created LU-11631 to track the improvement where rename will just move the whole symlink instead of leaving an agent inode behind.

Comment by Gerrit Updater [ 13/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32797/
Subject: LU-11130 osd-ldiskfs: create non-empty local agent symlinks
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c3a836364892cacbc4737645893b094971c6ec49

Comment by Peter Jones [ 13/Nov/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 07/Jan/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33973
Subject: LU-11130 osd-ldiskfs: create non-empty local agent symlinks
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: bc3809f679caa6d5dc166cc78f706af344dd55fc

Comment by Gerrit Updater [ 02/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33973/
Subject: LU-11130 osd-ldiskfs: create non-empty local agent symlinks
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: dd79052617be1002b0dcd6e36d74f8124c403177

Generated at Sat Feb 10 02:41:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.