[LU-16808] lfs find --printf fails on FIFOs and special files Created: 09/May/23  Updated: 20/Jun/23  Resolved: 20/Jun/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Thomas Bertschinger Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11872 Request for option not to follow syml... Resolved
is related to LU-15837 "lfs find -printf" improvements Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
$ mkfifo myfifo
$ lfs find myfifo
myfifo 
$ lfs find myfifo --printf "%p\n"
** command hangs **
^C

$ sudo mknod mychardev c 1 5
$ lfs find mychardev
mychardev
$ lfs find mychardev --printf "%p\n"
lfs: failed for 'mychardev': Inappropriate ioctl for device


 Comments   
Comment by Patrick Farrell [ 09/May/23 ]

So it shouldn't hang on the fifo obviously, but can you explain the behavior you'd expect on these special files?  I mean, what does lfs find do in general if called on a regular file rather than a directory?

Comment by Thomas Bertschinger [ 09/May/23 ]

I suppose I'd expect the printf to print out the desired information without error (as long as there are no Lustre-specific format directives that don't apply to special files). I think this would mainly be relevant if you pass `lfs find` a directory and there are special files under that directory tree. I guess it's probably not common to put special files on Lustre so I'll admit this likely wouldn't be seen on real systems.

I would expect this to work without error:

$ cd /mnt/lustre
$ touch a
$ mknod mychardev c 1 5
$ lfs find /mnt/lustre --printf "%p\n"
/mnt/lustre
/mnt/lustre/a
lfs: failed for '/mnt/lustre': Inappropriate ioctl for device

But this is probably fine failing (unless maybe the %Li should just be empty or have a placeholder value in the line for the special file?):

$ $ lfs find /mnt/lustre --printf "%Li %p\n"
0 /mnt/lustre
1 /mnt/lustre/a
lfs: failed for '/mnt/lustre': Inappropriate ioctl for device

Also, I think that error message should include the filename that failed because if you're running find on a big tree it won't necessarily be obvious what failed.

Comment by Patrick Farrell [ 09/May/23 ]

"I guess it's probably not common to put special files on Lustre so I'll admit this likely wouldn't be seen on real systems."
Well, people definitely do it, and special files can also be an implementation detail of some libraries/program environments/build systems/etc, so they do matter for us.  I agree it's not a massive gap, but it's a valid concern.

FWIW, after reading your response, I think in general we should probably not return inappropriate ioctl (and obviously not for the requests which have a clear correct answer) - we should return whatever makes the most sense.  imo, it's confusing to get an error in the middle of an fs scan type operation.

Glancing in to the code, it looks like our ll_special_inode_operations (our inode operations for all file structs which are not directories, regular files, or links) does not provide the .unlocked_ioctl function, so Lustre has no support for Lustre specific ioctls on special files.  (I'm just going to use 'special files' to mean anything that's not a directory, regular file, or symlink.)

I think you have opened a bit of a can of worms here, unfortunately, since now you may have to consider which Lustre specific ioctls (and there are a lot of them) are meaningful on special files and give them implementations.  (Possibly it might make sense to share the ll_file_ioctl function as used for regular files, but you'd still have to walk through all the ioctls and add various checks.)

Oh and we don't provide unlocked_ioctl for symlinks either!  That's interesting; I wonder how that works in practice and how others/VFS handles this.

Like I said, a little bit of a can of worms.

It's probably possible to figure out what ioctls 'find' is using and just make sure those have handling for special files (and possibly symlinks too, as appropriate?), and ignore the rest of the Lustre specific ioctls.  That wouldn't be a regression from current behavior, where no Lustre specific ioctls work on special files at all (afaics).

Comment by Patrick Farrell [ 09/May/23 ]

Oh, and absolutely about the error message including the file name, that's obviously right.  That could be a quick separate patch if you wanted to do it.

Comment by Patrick Farrell [ 09/May/23 ]

Also, I didn't look at why find hangs on fifos, that's probably the most problematic issue here.

Comment by Andreas Dilger [ 09/May/23 ]

The FIFO hang is probably because the open() immediately gets blocked on waiting for some data to be available on the pipe? The strace shows:

write(1, "/mnt/testfs/newdir\n", 19/mnt/testfs/newdir
)    = 19
close(3)                                = 0
fstat(4, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl(4, F_GETFL)                       = 0x28800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW)
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
getdents(4, /* 3 entries */, 32768)     = 72
ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x69, 0x16, 0x140), 0x16f3040) = 0
ioctl(4, _IOC(_IOC_READ, 0x66, 0xaf, 0x4), 0x7ffd455d7480) = 0
open("/mnt/testfs/newdir/char", O_RDONLY) = 3
ioctl(3, FS_IOC_FSGETXATTR, 0x7ffd455d70b0) = -1 ENOTTY (Inappropriate ioctl for device)
close(3)                                = 0
ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x69, 0x16, 0x140), 0x1f84040) = 0
ioctl(4, _IOC(_IOC_READ, 0x66, 0xaf, 0x4), 0x7ffc5a629f30) = 0
open("/mnt/testfs/newdir/fifo", O_RDONLY

I suspect that the "semantic traverse" can already use the file type in the directory entry to skip trying to process such files.

In fact, looking at the strace output for regular file processing it seems like it is doing far too much for most files already:

1683657120.708615 getdents(4, /* 102 entries */, 32768) = 6448 <0.000784>
1683657120.709513 ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x69, 0x16, 0x140), 0xea2040) = 0 <0.002730>
1683657120.712309 open("/mnt/testfs/dir/filename_is_long-15", O_RDONLY) = 3 <0.003517>
1683657120.715904 ioctl(3, _IOC(_IOC_READ, 0x66, 0xaf, 0x4), 0x7ffc0a9b3ba0) = 0 <0.000140>
1683657120.716106 ioctl(3, FS_IOC_FSGETXATTR, 0x7ffc0a9b37d0) = 0 <0.000044>
1683657120.716243 newfstatat(4, "filename_is_long-15", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000683>
1683657120.717022 write(1, "/mnt/testfs/dir/filename_is_long-15", 36) = 36 <0.000296> /mnt/testfs/dir/filename_is_long-15
1683657120.717386 close(3) = 0 <0.002073>

The ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x69, 0x16, 0x140) = IOC_MDC_GETFILEINFO_V2) is expected for every file, since it gets the file layout and MDT attributes from the parent directory but we definitely should not be doing open() and ioctl(FS_IOC_FSGETXATTR) and newfstatat() on every file when we are only printing the pathname. That is a lot of overhead (more than double the original ioctl(IOC_MDC_GETFILEINFO_V2) call.

It looks like this was added as part of the -printf functionality, since running without -printf "%p" shows none of that overhead:

1683657295.265889 getdents(4, /* 102 entries */, 32768) = 6448 <0.000393>
1683657295.266362 write(1, "/mnt/testfs/dir/filename_is_long-15"..., 36) = 36 <0.000009> /mnt/testfs/dir/filename_is_long-15
1683657295.266446 write(1, "/mnt/testfs/dir/filename_is_long-44"..., 36) = 36 <0.000009> /mnt/testfs/dir/filename_is_long-44

I guess that is already described in LU-15837 under "selective fetching of metadata attributes. In particular, fetching the projid for a file requires an extra syscall that is unnecessary if the projid is not being printed" but it isn't just one syscall but multiple.

Comment by Gerrit Updater [ 10/May/23 ]

"Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50905
Subject: LU-16808 lfs: lfs find --printf won't hang on special files
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 935a7716d744d2c2009a363f0b1bdbaee84d0895

Comment by Gerrit Updater [ 10/May/23 ]

"Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50916
Subject: LU-16808 lfs: return error if projid cannot be retrieved
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 37871eccbaafe1e2b7a07dd74070ef7ba2d40751

Comment by Thomas Bertschinger [ 16/May/23 ]

I noticed another (potential) issue with symlinks and the project ID. For most attributes it appears lfs find uses data from the symlink itself, but for project ID, lfs find seems to use the value from the target:

$ touch file20 && lfs project -p 20 file20
$ ln -s file20 link21 && lfs project -p 21 link21

$ lfs project .
   20 - ./file20
   21 - ./link21

# including %s in the format string to demonstrate that lfs find uses values from the link itself (and not the target) for attributes other than project id
$ lfs find . --printf "%p %s %LP\n"
. 4096 0
./file20 0 20
./link21 6 20

Would we expect that lfs find would use the value from the link in this case?

Interestingly, I ran a quick test on ext4 and found that (at least in my setup) projid on symlinks did not work:

ext4-mount $ touch file20 && chattr -p 20 file20
ext4-mount $ ln -s file20 link21
 
ext4-mount $ chattr -p 21 link21
chattr: Operation not supported while reading flags on link21

ext4-mount $ lsattr -p .
    0 --------------e------- ./lost+found
lsattr: Operation not supported While reading flags on ./link21
   20 --------------e------- ./file20

I am not sure what the "expected" behavior of projid and symlinks is so I don't know if the lfs find behavior above is problematic.

@adilger any thoughts on this? If the current behavior is wrong should I submit a ticket for this?

Comment by Andreas Dilger [ 16/May/23 ]

Getting the projid on the symlink target is intentional behavior (LU-11872).

Comment by Gerrit Updater [ 20/Jun/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50905/
Subject: LU-16808 lfs: lfs find --printf won't hang on special files
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f47f816a5b1deabc1d501332a1a30af205d22515

Comment by Gerrit Updater [ 20/Jun/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50916/
Subject: LU-16808 lfs: return invalid=-1 if no projid retrieved
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8d09b3e77744973d10d2029028d7e6f14591bc6d

Comment by Peter Jones [ 20/Jun/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:30:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.