Details
Description
Per Rick's LUG presentation, there are still a number of optimizations for "lfs find -printf" that were not included in the original patch https://review.whamcloud.com/45136 "LU-10378 utils: add formatted printf to lfs find" landing that should still be addressed:
- selective fetching of metadata attributes. In particular, fetching the projid for a file requires an extra syscall that is unnecessary if the projid is not being printed, at least until LU-12480 is implemented. Most of the other MDT attributes are "free" once any attribute is read for a file, so it probably doesn't make sense to micro-optimize here.
- pre-parsing of the -printf argument string (assuming this actually shows up in the CPU profile, I'm not sure if it is actually important vs. stat/RPC overhead)
- maybe others, I didn't catch all of them
Attachments
Issue Links
- Clones
-
LU-15837 "lfs find -printf" improvements
-
- Resolved
-
- is related to
-
LU-16560 'lfs find -printf %w' does not print birth time
-
- Open
-
-
LU-16808 lfs find --printf fails on FIFOs and special files
-
- Resolved
-
-
LU-17219 lfs find: add ability to print extended attributes
-
- Open
-
-
LU-16760 "lfs find" support for fscrypt and other file attributes
-
- Resolved
-
- is related to
-
LU-5170 lfs usability
-
- Open
-
-
LU-12480 add STATX_PROJID to upstream kernel
-
- Open
-
-
LU-7495 lfs find is missing "-links" support
-
- Resolved
-
-
LU-10378 "lfs find" is missing "-printf" support
-
- Resolved
-
-
LU-15504 "lfs find" is missing "-ls" support
-
- Resolved
-
-
LU-15743 "lfs find" is missing "-xattr" support
-
- Resolved
-
-
LU-16798 lfs find: new --jobid option
-
- Closed
-
Looking at the strace output for regular file processing it seems like it is doing far too much for most files:
The ioctl(4, _IOC(_IOC_READ|_IOC_WRITE, 0x69, 0x16, 0x140) = IOC_MDC_GETFILEINFO_V2) is expected for every file, since it gets the file layout and MDT attributes from the parent directory but we definitely should not be doing open() and ioctl(FS_IOC_FSGETXATTR) (to get projid) and newfstatat() (not sure why?) on every file when we are only printing the pathname. That is a lot of overhead (more than double the original ioctl(IOC_MDC_GETFILEINFO_V2) call).
Running without -printf "%p" shows none of that overhead:
I'm thinking that "gather_all" should be changed to be a STATX_ bitmask to indicate which fields are needed, and then the code could be changed to check the bitmask when the attributes are being fetched:
#ifndef STATX_PROJID
#define STATX_PROJID 0x40000000 /* only used internally */
#endif
if (param->fp_check_projid || gather_printf & STATX_PROJID)
The definition of STAX_PROJID would only be used internally until the kernel could itself return projid as part of statx() (LU-12480).