[LU-13823] Two hard links to the same directory Created: 27/Jul/20  Updated: 03/Feb/23  Resolved: 03/Feb/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Olaf Faaland Assignee: Andreas Dilger
Resolution: Cannot Reproduce Votes: 0
Labels: llnl
Environment:

kernel-3.10.0-1127.0.0.1chaos.ch6.x86_64
zfs-0.7.11-9.4llnl.ch6.x86_64
lustre-2.10.8_9.chaos-1.ch6.x86_64


Issue Links:
Related
is related to LU-13758 corrupt directory entry: FID is invalid Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

2 directories in the same filesystem have the same inode

[root@rzslic5]==> ls -lid /p/czlustre2/reza2/5_star_pattern_J
288233885643309090 drwx------ 3 58904 58904 33280 Jul 24 15:55 /p/czlustre2/reza2/5_star_pattern_J
[root@rzslic5]==> ls -lid /p/czlustre2/reza2/5_star_pattern_J_2/
288233885643309090 drwx------ 3 58904 58904 33280 Jul 24 15:55 /p/czlustre2/reza2/5_star_pattern_J_2/

the same FID

[root@rzslic2:reza2]# lfs path2fid 5_star_pattern_J
[0x40003311e:0x22:0x0]
[root@rzslic2:reza2]# lfs path2fid 5_star_pattern_J_2
[0x40003311e:0x22:0x0] 

The directory has one subdirectory:

[root@oslic7:reza2]# ls -al 5_star_pattern_J
total 130
drwx------   3 pearce7 pearce7 33280 Jul 24 15:55 .
drwx------ 155 reza2   reza2   57856 Jul 27 12:54 ..
drwx------   2 pearce7 pearce7 41472 Sep 21  2019 0


 Comments   
Comment by Olaf Faaland [ 27/Jul/20 ]

Perhaps the underlying cause of LU-13758 ?

Comment by Olaf Faaland [ 27/Jul/20 ]

stat shows:

[root@rzslic2:reza2]# stat 5_star_pattern_J
  File: '5_star_pattern_J'
  Size: 33280     	Blocks: 65         IO Block: 131072 directory
Device: a2f6a642h/2734073410d	Inode: 288233885643309090  Links: 3
Access: (0700/drwx------)  Uid: (58904/ UNKNOWN)   Gid: (58904/ UNKNOWN)
Access: 2020-07-27 10:08:25.000000000 -0700
Modify: 2020-07-24 15:55:46.000000000 -0700
Change: 2020-07-24 15:55:46.000000000 -0700
 Birth: - 
Comment by Olaf Faaland [ 27/Jul/20 ]

The underlying object was created in September 2019:

[root@zinc9:~]# zdb -dddddddd zinc9/mdt1@toss-4847 117510086
Dataset zinc9/mdt1@toss-4847 [ZPL], ID 529, cr_txg 184688294, 162G, 57992145 objects, rootbp DVA[0]=<1:1fd6afb000:1000> DVA[1]=<2:3b680fe000:1000> [L0 DMU objset] fletcher4 uncompressed LE contiguous unique double size=800L/800P birth=184688294L/184688294P fill=57992145 cksum=c37b11463:e9a37744f6d:b651035441589:6a1e4605d58ac95


    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
 117510086    2   128K    16K    32K     512    32K  100.00  ZFS directory (K=inherit) (Z=inherit)
                                               192   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
	dnode maxblkid: 1
	path	???<object#117510086>
	uid     58904
	gid     58904
	atime	Mon Jul 27 10:08:25 2020
	mtime	Fri Jul 24 15:55:46 2020
	ctime	Fri Jul 24 15:55:46 2020
	crtime	Fri Sep 20 19:23:47 2019
	gen	95807852
	mode	40700
	size	2
	parent	8807988
	links	3
	pflags	0
	rdev	0x0000000000000000
	SA xattrs: 212 bytes, 3 entries


		trusted.lma = \000\000\000\000\000\000\000\000\0361\003\000\004\000\000\000"\000\000\000\000\000\000\000
		trusted.version = Y4\3243(\000\000\000
		trusted.link = \337\361\352\021\001\000\000\000<\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\002\200\005`H\000\000\000\007\000\000\000\0005_star_pattern_J_2
 
Comment by Andreas Dilger [ 27/Jul/20 ]

Note that birth time will be directly accessible on Lustre 2.14 clients with a suitably new (maybe RHEL8?) stat command that uses statx() under the covers.

Is it possible to ask the user how these two directories were created? Was there a rename, or they were possibly created in parallel? Was "lfs migrate -m" used on the directory to migrate between MDTs?

The "trusted.link" xattr shows only "5_star_pattern_J_2" for the name of the directory.

The dnode shows "links 3", but that could be because of a subdirectory, and not necessarily because of multiple hard links to the file, but it would be useful to check. If the client had (somehow) allowed multiple hard links to the directory, it should also have added the filename to the "trusted.link" xattr at that time.

Have you tried creating hard links to a directory with ZFS? This should be caught by the client VFS, and also by ldiskfs, but I'm wondering if maybe ldiskfs implements such a check, but ZFS does this in the ZPL and that is not checked by osd-zfs?

I ran a quick test on ldiskfs, and I was surprised to see that a hard-link RPC is sent to the MDS in such a case when the "ln" binary is not used. The ln binary stat()}}s the source and target names itself to see if they exist and the file type before even calling the {{link() syscall:

stat("/mnt/testfs/link", 0x7ffee35aa3c0) = -1 ENOENT (No such file or directory)
lstat("/mnt/testfs/newdir", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
write(2, "ln: /mnt/testfs/newdir: hard link not allowed for directory") = 61

Running multiop (from the lustre-tests RPM) doesn't do any sanity checking before calling the link() syscall:

mkdir /mnt/testfs/newdir
strace multiop /mnt/testfs/newdir L /mnt/testfs/link

link("/mnt/testfs2/newerdir", "/mnt/testfs2/link3") = -1 EPERM (Operation not permitted)
write(3, "link(): Operation not permitted\n", 32

and this generates an RPC to the MDS, so it seems possible if some user binary was calling link(3) itself instead of ln(1) it might trigger this itself?

Comment by Olaf Faaland [ 28/Jul/20 ]

Is it possible to ask the user how these two directories were created? Was there a rename, or they were possibly created in parallel? Was "lfs migrate -m" used on the directory to migrate between MDTs?

I'm working on getting that information. There's been a complex chain of ownership so we're working through it.

The "trusted.link" xattr shows only "5_star_pattern_J_2" for the name of the directory.

The dnode shows "links 3", but that could be because of a subdirectory, and not necessarily because of multiple hard links to the file, but it would be useful to check. If the client had (somehow) allowed multiple hard links to the directory, it should also have added the filename to the "trusted.link" xattr at that time.

There is one subdirectory, named "0". Sorry I left that out of the description.

Have you tried creating hard links to a directory with ZFS? This should be caught by the client VFS, and also by ldiskfs, but I'm wondering if maybe ldiskfs implements such a check, but ZFS does this in the ZPL and that is not checked by osd-zfs?

I haven't yet figured out where it's checked, but neither ZPL nor our lustre 2.10.8 backed by ZFS 0.7 allowed hard linking to a directory via link(3) when I tried it. In both cases link() failed and errno was set to EPERM, as you saw with your test. But there was nothing exciting going on while I tried that, like many processes in parallel, or a failover, etc.

bash-4.2$ ll -d existing newlink
ls: cannot access newlink: No such file or directory
drwx------ 2 faaland1 faaland1 33280 Jul 27 16:50 existing

bash-4.2$ strace -e link ./dolink existing newlink;
link("existing", "newlink")             = -1 EPERM (Operation not permitted)
+++ exited with 255 +++

and this generates an RPC to the MDS, so
it seems possible if some user binary was calling link(3) itself instead of ln(1) it might trigger this itself?

Yes, maybe. I hope we're able to find out how these directories were created.

Comment by Olaf Faaland [ 26/Aug/20 ]

Andreas,

We weren't able to get any good information about what led up to this.  We believe it occurred during the "mv" operation below (this is bash history from a node the sysadmin was using)

40 cd /p/czlustre3/pearce7
41 ls
42 cd reza2
43 ls
44 ls /p/czlustre3/reza2/
45 pwd
46 mv * /p/czlustre3/reza2/

and that before the "mv" command

  • /p/czlustre3/reza2/ was empty, and
  • /p/czlustre3/pearce7/reza2/5_star_pattern_J was an apparently normal directory.

During the "mv" command the sysadmin got an error message that 5_star_pattern_J already existed in the target. At that point he looked and saw that both the source and target directory had a subdirectory by that name, and then found they were two references to the same directory.

It's hard to see how this sequence of events could create the problem, but that's unfortunately all we were able to find out.

Comment by Olaf Faaland [ 01/Sep/20 ]

Andreas,

I just noticed that the bash_history contents I pasted in above are for the wrong file system (/p/czlustre3 AKA /p/lustre3, != /p/lustre2). It's typical that our users have quota and a directory on multiple Lustre file systems.

So we have no context at all. I don't see what else we can do. If you can think of anything else we should look at, or a debug patch that would be helpful, let me know.

thanks

Comment by Olaf Faaland [ 30/Dec/20 ]

Removing "topllnl" tag because I do not see any way for us to get more information about what happened. I'm going to leave it open in case we see the same problem again, or someone else does.

Generated at Sat Feb 10 03:04:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.