[LU-13823] Two hard links to the same directory Created: 27/Jul/20 Updated: 03/Feb/23 Resolved: 03/Feb/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Olaf Faaland | Assignee: | Andreas Dilger |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
kernel-3.10.0-1127.0.0.1chaos.ch6.x86_64 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
2 directories in the same filesystem have the same inode [root@rzslic5]==> ls -lid /p/czlustre2/reza2/5_star_pattern_J 288233885643309090 drwx------ 3 58904 58904 33280 Jul 24 15:55 /p/czlustre2/reza2/5_star_pattern_J [root@rzslic5]==> ls -lid /p/czlustre2/reza2/5_star_pattern_J_2/ 288233885643309090 drwx------ 3 58904 58904 33280 Jul 24 15:55 /p/czlustre2/reza2/5_star_pattern_J_2/ the same FID [root@rzslic2:reza2]# lfs path2fid 5_star_pattern_J [0x40003311e:0x22:0x0] [root@rzslic2:reza2]# lfs path2fid 5_star_pattern_J_2 [0x40003311e:0x22:0x0] The directory has one subdirectory: [root@oslic7:reza2]# ls -al 5_star_pattern_J total 130 drwx------ 3 pearce7 pearce7 33280 Jul 24 15:55 . drwx------ 155 reza2 reza2 57856 Jul 27 12:54 .. drwx------ 2 pearce7 pearce7 41472 Sep 21 2019 0 |
| Comments |
| Comment by Olaf Faaland [ 27/Jul/20 ] |
|
Perhaps the underlying cause of |
| Comment by Olaf Faaland [ 27/Jul/20 ] |
|
stat shows: [root@rzslic2:reza2]# stat 5_star_pattern_J File: '5_star_pattern_J' Size: 33280 Blocks: 65 IO Block: 131072 directory Device: a2f6a642h/2734073410d Inode: 288233885643309090 Links: 3 Access: (0700/drwx------) Uid: (58904/ UNKNOWN) Gid: (58904/ UNKNOWN) Access: 2020-07-27 10:08:25.000000000 -0700 Modify: 2020-07-24 15:55:46.000000000 -0700 Change: 2020-07-24 15:55:46.000000000 -0700 Birth: - |
| Comment by Olaf Faaland [ 27/Jul/20 ] |
|
The underlying object was created in September 2019: [root@zinc9:~]# zdb -dddddddd zinc9/mdt1@toss-4847 117510086 Dataset zinc9/mdt1@toss-4847 [ZPL], ID 529, cr_txg 184688294, 162G, 57992145 objects, rootbp DVA[0]=<1:1fd6afb000:1000> DVA[1]=<2:3b680fe000:1000> [L0 DMU objset] fletcher4 uncompressed LE contiguous unique double size=800L/800P birth=184688294L/184688294P fill=57992145 cksum=c37b11463:e9a37744f6d:b651035441589:6a1e4605d58ac95 Object lvl iblk dblk dsize dnsize lsize %full type 117510086 2 128K 16K 32K 512 32K 100.00 ZFS directory (K=inherit) (Z=inherit) 192 bonus System attributes dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR dnode maxblkid: 1 path ???<object#117510086> uid 58904 gid 58904 atime Mon Jul 27 10:08:25 2020 mtime Fri Jul 24 15:55:46 2020 ctime Fri Jul 24 15:55:46 2020 crtime Fri Sep 20 19:23:47 2019 gen 95807852 mode 40700 size 2 parent 8807988 links 3 pflags 0 rdev 0x0000000000000000 SA xattrs: 212 bytes, 3 entries trusted.lma = \000\000\000\000\000\000\000\000\0361\003\000\004\000\000\000"\000\000\000\000\000\000\000 trusted.version = Y4\3243(\000\000\000 trusted.link = \337\361\352\021\001\000\000\000<\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\002\200\005`H\000\000\000\007\000\000\000\0005_star_pattern_J_2 |
| Comment by Andreas Dilger [ 27/Jul/20 ] |
|
Note that birth time will be directly accessible on Lustre 2.14 clients with a suitably new (maybe RHEL8?) stat command that uses statx() under the covers. Is it possible to ask the user how these two directories were created? Was there a rename, or they were possibly created in parallel? Was "lfs migrate -m" used on the directory to migrate between MDTs? The "trusted.link" xattr shows only "5_star_pattern_J_2" for the name of the directory. The dnode shows "links 3", but that could be because of a subdirectory, and not necessarily because of multiple hard links to the file, but it would be useful to check. If the client had (somehow) allowed multiple hard links to the directory, it should also have added the filename to the "trusted.link" xattr at that time. Have you tried creating hard links to a directory with ZFS? This should be caught by the client VFS, and also by ldiskfs, but I'm wondering if maybe ldiskfs implements such a check, but ZFS does this in the ZPL and that is not checked by osd-zfs? I ran a quick test on ldiskfs, and I was surprised to see that a hard-link RPC is sent to the MDS in such a case when the "ln" binary is not used. The ln binary stat()}}s the source and target names itself to see if they exist and the file type before even calling the {{link() syscall: stat("/mnt/testfs/link", 0x7ffee35aa3c0) = -1 ENOENT (No such file or directory)
lstat("/mnt/testfs/newdir", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
write(2, "ln: /mnt/testfs/newdir: hard link not allowed for directory") = 61
Running multiop (from the lustre-tests RPM) doesn't do any sanity checking before calling the link() syscall: mkdir /mnt/testfs/newdir
strace multiop /mnt/testfs/newdir L /mnt/testfs/link
link("/mnt/testfs2/newerdir", "/mnt/testfs2/link3") = -1 EPERM (Operation not permitted)
write(3, "link(): Operation not permitted\n", 32
and this generates an RPC to the MDS, so it seems possible if some user binary was calling link(3) itself instead of ln(1) it might trigger this itself? |
| Comment by Olaf Faaland [ 28/Jul/20 ] |
I'm working on getting that information. There's been a complex chain of ownership so we're working through it.
There is one subdirectory, named "0". Sorry I left that out of the description.
I haven't yet figured out where it's checked, but neither ZPL nor our lustre 2.10.8 backed by ZFS 0.7 allowed hard linking to a directory via link(3) when I tried it. In both cases link() failed and errno was set to EPERM, as you saw with your test. But there was nothing exciting going on while I tried that, like many processes in parallel, or a failover, etc. bash-4.2$ ll -d existing newlink
ls: cannot access newlink: No such file or directory
drwx------ 2 faaland1 faaland1 33280 Jul 27 16:50 existing
bash-4.2$ strace -e link ./dolink existing newlink;
link("existing", "newlink") = -1 EPERM (Operation not permitted)
+++ exited with 255 +++
Yes, maybe. I hope we're able to find out how these directories were created. |
| Comment by Olaf Faaland [ 26/Aug/20 ] |
|
Andreas, We weren't able to get any good information about what led up to this. We believe it occurred during the "mv" operation below (this is bash history from a node the sysadmin was using) 40 cd /p/czlustre3/pearce7 41 ls 42 cd reza2 43 ls 44 ls /p/czlustre3/reza2/ 45 pwd 46 mv * /p/czlustre3/reza2/ and that before the "mv" command
During the "mv" command the sysadmin got an error message that 5_star_pattern_J already existed in the target. At that point he looked and saw that both the source and target directory had a subdirectory by that name, and then found they were two references to the same directory. It's hard to see how this sequence of events could create the problem, but that's unfortunately all we were able to find out. |
| Comment by Olaf Faaland [ 01/Sep/20 ] |
|
Andreas, I just noticed that the bash_history contents I pasted in above are for the wrong file system (/p/czlustre3 AKA /p/lustre3, != /p/lustre2). It's typical that our users have quota and a directory on multiple Lustre file systems. So we have no context at all. I don't see what else we can do. If you can think of anything else we should look at, or a debug patch that would be helpful, let me know. thanks |
| Comment by Olaf Faaland [ 30/Dec/20 ] |
|
Removing "topllnl" tag because I do not see any way for us to get more information about what happened. I'm going to leave it open in case we see the same problem again, or someone else does. |