[LU-15743] "lfs find" is missing "-xattr" support Created: 14/Apr/22  Updated: 05/Feb/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Thomas Bertschinger
Resolution: Unresolved Votes: 0
Labels: lad23dd, lug23dd, medium

Issue Links:
Related
is related to LU-5170 lfs usability Open
is related to LU-13031 store JobID of program that created f... Resolved
is related to LU-16798 lfs find: new --jobid option Closed
is related to LU-15837 "lfs find -printf" improvements Open
is related to LU-17219 lfs find: add ability to print extend... Open
is related to LU-16760 "lfs find" support for fscrypt and ot... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

It would be useful if "lfs find" could check/match xattrs on files.

This should use "--xattr <name>" to match files that have a <name> xattr, and "--xattr-match <name>=<value>", where <name> and <value> are shell patterns that can match arbitrary text strings in the xattr. The output would be printed in plain text with escaped control characters.



 Comments   
Comment by Gian-Carlo Defazio [ 01/May/23 ]

Grabbing this one for developer day.

Comment by Andreas Dilger [ 30/Aug/23 ]

defazio did you ever make progress on this enhancement during developer day or afterward? With the closing of LU-16798, implementing the ability to do "lfs find --xattr user.job=REGEXP" would be particularly useful.

Comment by Gian-Carlo Defazio [ 19/Sep/23 ]

adilger Sorry, I never got around to doing this. Go ahead and assign it to someone else if you'd like, I'm not sure when I'll get to it.

Comment by Thomas Bertschinger [ 11/Oct/23 ]

I've been looking into this and can submit a patch if this isn't already being worked on.
One question is if this should have a printf directive?
Right now if I wanted to print out the jobid of some files I would have to do

lfs find /mnt/lustre --xattr user.job | xargs getfattr -m user.job -d 

The right way to do this with a printf directive isn't obvious to me, for example, would we want to only dump the user namespace (since e.g. "lustre.lov" wouldn't be helpful to dump)? Would we want to dump every user namespace xattr with newlines separating them, or in some other format? Would we want to dump only xattrs that match the supplied --xattr and --xattr-match arguments?
With these questions I think just using xargs getfattr is the right way to go since then we don't have to make decisions for the user on how to print xattrs. But I wanted to ask about adding printf support in case it's important/desirable.

Comment by Andreas Dilger [ 11/Oct/23 ]

bertschinger, it is a good question you raise, and I haven't really thought about that aspect very much. On the one hand, it would be useful to have a -printf directive to allow printing specific xattrs if they are matched (e.g. sysadmin wants to know what is in "user.job" after finding files with a regexp). However, I don't think implementing that is a requirement for "lfs find --xattr" to be implemented. As you wrote, this could be achieved with other tools after the fact, but that also adds overhead for every file accessed when "lfs" may already have this information in memory after checking the file.

This should be filed as a separate improvement ticket, and we can discuss there the right syntax and options for printing the xattr. I suspect something like "getfattr" can do (dump in text, hex, base64, with options for which xattrs to print) but details TBD once we have some time to think about it. I don't think this is the most critical gap in the tool, but a nice to have.

Comment by Thomas Bertschinger [ 12/Oct/23 ]

@Andreas thanks for the feedback. One other design / interface question...
The description has --xattr and --xattr-match to separately test for the presence of an xattr and to match its contents.
I think having two options is redundant because to test for the presence of an xattr without caring about its contents, a single <name>=<value> option can accept

--xattr "user.job="

or even

--xattr "user.job"

What do you think?
This is problematic if you want to test for an xattr with '=' in its name, but that's likely an uncommon case, especially if the main use is searching JobIDs. If it's important though, it probably needs to allow escaping the '='.
I did also consider the interface of other find implementations, but those that have xattr support seem to use --xattr to match the presence of any xattr at all (which isn't useful for Lustre since every file will match). So I don't think compatibility with other finds is worthwhile here but just wanted to bring it up for consideration.

Comment by Andreas Dilger [ 13/Oct/23 ]

I think the following would be most useful:

-xattr name_pattern[=value_pattern]
              File has at least one extended attribute with name that
              matches shell pattern name_pattern, and value that matches
              shell pattern value_pattern if specified.

While searching for user.job xattrs will be a common use case, I suspect that there will be other cases that need to be handled, so allowing a regexp for both the name and value is useful.

Comment by Gerrit Updater [ 23/Oct/23 ]

"Thomas Bertschinger <bertschinger@lanl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52804
Subject: LU-15743 utils: add --xattr option to lfs find
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ab05f6780984bf74487cb284ec85bfcf31991d60

Comment by Andreas Dilger [ 24/Oct/23 ]

There is a prototype patch for "GNU find" at https://gitlab.com/mweetman/findutils that adds "-xattr" support. According to the find.1 man page in that patch, the "-xattr PATTERN" option takes a regular expression for PATTERN by default, but it also allows the "-regextype TYPE" option to change the regular expression type to one of emacs, posix-awk, posix-basic, posix-egrep, and posix-extended. I don't think there is any urgent need for this, since "lfs find" implements neither the "-path" or "-regex" tests.

Comment by Thomas Bertschinger [ 25/Oct/23 ]

Andreas, adding a question here instead of on the Gerrit review about the regex matching issue, since it's more of a user interface / design question.

I'm not really thrilled that the regex is implicitly matching only part of the string for both the name and value, rather than explicitly requiring a wildcard for the remainder.  That makes it harder to match names/values that are subsets of each other, like "system.acl" "system.acl_default" or whatever.
I would have assumed that it needs a regex "trusted." to explicitly match the name of all "trusted." xattrs.  Otherwise, it will be harder to match substrings in the xattr name.  I see that this is how getfattr works for "-m", but conversely "find -regex f" does not return all files with "f" in them (not for any regextype), only when "find -regex '.f.'" or "find -path 'f'" is used does that happen.

It looks like GNU find uses a non-POSIX regex interface from Gnulib, and its matching function re_match returns the number of chars matched. So GNU find accomplishes this with basically

if (re_match(..., path, ...) == strlen(path))
...

It should be possible to achieve the same result with the POSIX regex interface using the pmatch[] argument to regexec, like this:

regmatch_t pmatch[1];

regexec(re, input, 1, pmatch, 0);
if (pmatch[0].rm_so == 0 && pmatch[0].rm_eo == strlen(input)) {
        /* matched entire input string */
}

Alternatively, with the current behavior, matching the entire string could be accomplished by the user with anchors: 

lfs find --xattr "^user.*$ /mnt/lustre

This could be documented/suggested in the man page to make it clear to the user if they want this behavior.

A final option could be to use the Gnulib regex instead of POSIX regex for lfs find, but I'm not sure if that really makes sense.

What do you think is the best option?

Comment by Andreas Dilger [ 25/Oct/23 ]

I had suggested to use regex_comp(..., REG_NOSUB), but I think the pmatch option is best. I think it is non-intuitive to make the user specify '^' or '$' to terminate the regexp.

Generated at Sat Feb 10 03:20:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.