Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15743

"lfs find" is missing "-xattr" support

Details

    • 9223372036854775807

    Description

      It would be useful if "lfs find" could check/match xattrs on files.

      This should use "--xattr <name>" to match files that have a <name> xattr, and "--xattr-match <name>=<value>", where <name> and <value> are shell patterns that can match arbitrary text strings in the xattr. The output would be printed in plain text with escaped control characters.

      Attachments

        Issue Links

          Activity

            [LU-15743] "lfs find" is missing "-xattr" support
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52804/
            Subject: LU-15743 utils: add --xattr option to lfs find
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 978ff35d39e3f640a2bfc766b97982012ce07a80

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52804/ Subject: LU-15743 utils: add --xattr option to lfs find Project: fs/lustre-release Branch: master Current Patch Set: Commit: 978ff35d39e3f640a2bfc766b97982012ce07a80

            I had suggested to use regex_comp(..., REG_NOSUB), but I think the pmatch option is best. I think it is non-intuitive to make the user specify '^' or '$' to terminate the regexp.

            adilger Andreas Dilger added a comment - I had suggested to use regex_comp(..., REG_NOSUB) , but I think the pmatch option is best. I think it is non-intuitive to make the user specify ' ^ ' or ' $ ' to terminate the regexp.
            bertschinger Thomas Bertschinger added a comment - - edited

            Andreas, adding a question here instead of on the Gerrit review about the regex matching issue, since it's more of a user interface / design question.

            I'm not really thrilled that the regex is implicitly matching only part of the string for both the name and value, rather than explicitly requiring a wildcard for the remainder.  That makes it harder to match names/values that are subsets of each other, like "system.acl" "system.acl_default" or whatever.
            I would have assumed that it needs a regex "trusted.*" to explicitly match the name of all "trusted.*" xattrs.  Otherwise, it will be harder to match substrings in the xattr name.  I see that this is how getfattr works for "-m", but conversely "find -regex f" does not return all files with "f" in them (not for any regextype), only when "find -regex '.*f.*'" or "find -path '*f*'" is used does that happen.

            It looks like GNU find uses a non-POSIX regex interface from Gnulib, and its matching function re_match returns the number of chars matched. So GNU find accomplishes this with basically

            if (re_match(..., path, ...) == strlen(path))
            ...

            It should be possible to achieve the same result with the POSIX regex interface using the pmatch[] argument to regexec, like this:

            regmatch_t pmatch[1];
            
            regexec(re, input, 1, pmatch, 0);
            if (pmatch[0].rm_so == 0 && pmatch[0].rm_eo == strlen(input)) {
                    /* matched entire input string */
            }

            Alternatively, with the current behavior, matching the entire string could be accomplished by the user with anchors: 

            lfs find --xattr "^user.*$ /mnt/lustre

            This could be documented/suggested in the man page to make it clear to the user if they want this behavior.

            A final option could be to use the Gnulib regex instead of POSIX regex for lfs find, but I'm not sure if that really makes sense.

            What do you think is the best option?

            bertschinger Thomas Bertschinger added a comment - - edited Andreas, adding a question here instead of on the Gerrit review about the regex matching issue, since it's more of a user interface / design question. I'm not really thrilled that the regex is implicitly matching only part of the string for both the name and value, rather than explicitly requiring a wildcard for the remainder.  That makes it harder to match names/values that are subsets of each other, like "system.acl" "system.acl_default" or whatever. I would have assumed that it needs a regex "trusted.*" to explicitly match the name of all "trusted.*" xattrs.  Otherwise, it will be harder to match substrings in the xattr name.  I see that this is how getfattr works for "-m", but conversely "find -regex f" does not return all files with "f" in them (not for any regextype), only when " find -regex '.*f.*' " or " find -path '*f*' " is used does that happen. It looks like GNU find uses a non-POSIX regex interface from Gnulib, and its matching function re_match returns the number of chars matched. So GNU find accomplishes this with basically if (re_match(..., path, ...) == strlen(path)) ... It should be possible to achieve the same result with the POSIX regex interface using the pmatch[] argument to regexec, like this: regmatch_t pmatch[1]; regexec(re, input, 1, pmatch, 0); if (pmatch[0].rm_so == 0 && pmatch[0].rm_eo == strlen(input)) { /* matched entire input string */ } Alternatively, with the current behavior, matching the entire string could be accomplished by the user with anchors:  lfs find --xattr "^user.*$ /mnt/lustre This could be documented/suggested in the man page to make it clear to the user if they want this behavior. A final option could be to use the Gnulib regex instead of POSIX regex for lfs find, but I'm not sure if that really makes sense. What do you think is the best option?

            There is a prototype patch for "GNU find" at https://gitlab.com/mweetman/findutils that adds "-xattr" support. According to the find.1 man page in that patch, the "-xattr PATTERN" option takes a regular expression for PATTERN by default, but it also allows the "-regextype TYPE" option to change the regular expression type to one of emacs, posix-awk, posix-basic, posix-egrep, and posix-extended. I don't think there is any urgent need for this, since "lfs find" implements neither the "-path" or "-regex" tests.

            adilger Andreas Dilger added a comment - There is a prototype patch for "GNU find" at https://gitlab.com/mweetman/findutils that adds " -xattr " support. According to the find.1 man page in that patch, the " -xattr PATTERN " option takes a regular expression for PATTERN by default, but it also allows the " -regextype TYPE " option to change the regular expression type to one of emacs , posix-awk , posix-basic , posix-egrep , and posix-extended . I don't think there is any urgent need for this, since " lfs find " implements neither the " -path " or " -regex " tests.

            "Thomas Bertschinger <bertschinger@lanl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52804
            Subject: LU-15743 utils: add --xattr option to lfs find
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ab05f6780984bf74487cb284ec85bfcf31991d60

            gerrit Gerrit Updater added a comment - "Thomas Bertschinger <bertschinger@lanl.gov>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52804 Subject: LU-15743 utils: add --xattr option to lfs find Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ab05f6780984bf74487cb284ec85bfcf31991d60

            I think the following would be most useful:

            -xattr name_pattern[=value_pattern]
                          File has at least one extended attribute with name that
                          matches shell pattern name_pattern, and value that matches
                          shell pattern value_pattern if specified.
            

            While searching for user.job xattrs will be a common use case, I suspect that there will be other cases that need to be handled, so allowing a regexp for both the name and value is useful.

            adilger Andreas Dilger added a comment - I think the following would be most useful: -xattr name_pattern[=value_pattern] File has at least one extended attribute with name that matches shell pattern name_pattern, and value that matches shell pattern value_pattern if specified. While searching for user.job xattrs will be a common use case, I suspect that there will be other cases that need to be handled, so allowing a regexp for both the name and value is useful.

            @Andreas thanks for the feedback. One other design / interface question...
            The description has --xattr and --xattr-match to separately test for the presence of an xattr and to match its contents.
            I think having two options is redundant because to test for the presence of an xattr without caring about its contents, a single <name>=<value> option can accept

            --xattr "user.job="

            or even

            --xattr "user.job"

            What do you think?
            This is problematic if you want to test for an xattr with '=' in its name, but that's likely an uncommon case, especially if the main use is searching JobIDs. If it's important though, it probably needs to allow escaping the '='.
            I did also consider the interface of other find implementations, but those that have xattr support seem to use --xattr to match the presence of any xattr at all (which isn't useful for Lustre since every file will match). So I don't think compatibility with other finds is worthwhile here but just wanted to bring it up for consideration.

            bertschinger Thomas Bertschinger added a comment - @Andreas thanks for the feedback. One other design / interface question... The description has --xattr and --xattr-match to separately test for the presence of an xattr and to match its contents. I think having two options is redundant because to test for the presence of an xattr without caring about its contents, a single <name>=<value> option can accept --xattr "user.job=" or even --xattr "user.job" What do you think? This is problematic if you want to test for an xattr with '=' in its name, but that's likely an uncommon case, especially if the main use is searching JobIDs. If it's important though, it probably needs to allow escaping the '='. I did also consider the interface of other find implementations, but those that have xattr support seem to use --xattr to match the presence of any xattr at all (which isn't useful for Lustre since every file will match). So I don't think compatibility with other finds is worthwhile here but just wanted to bring it up for consideration.

            bertschinger, it is a good question you raise, and I haven't really thought about that aspect very much. On the one hand, it would be useful to have a -printf directive to allow printing specific xattrs if they are matched (e.g. sysadmin wants to know what is in "user.job" after finding files with a regexp). However, I don't think implementing that is a requirement for "lfs find --xattr" to be implemented. As you wrote, this could be achieved with other tools after the fact, but that also adds overhead for every file accessed when "lfs" may already have this information in memory after checking the file.

            This should be filed as a separate improvement ticket, and we can discuss there the right syntax and options for printing the xattr. I suspect something like "getfattr" can do (dump in text, hex, base64, with options for which xattrs to print) but details TBD once we have some time to think about it. I don't think this is the most critical gap in the tool, but a nice to have.

            adilger Andreas Dilger added a comment - bertschinger , it is a good question you raise, and I haven't really thought about that aspect very much. On the one hand, it would be useful to have a -printf directive to allow printing specific xattrs if they are matched (e.g. sysadmin wants to know what is in " user.job " after finding files with a regexp). However, I don't think implementing that is a requirement for " lfs find --xattr " to be implemented. As you wrote, this could be achieved with other tools after the fact, but that also adds overhead for every file accessed when " lfs " may already have this information in memory after checking the file. This should be filed as a separate improvement ticket, and we can discuss there the right syntax and options for printing the xattr. I suspect something like " getfattr " can do (dump in text, hex, base64, with options for which xattrs to print) but details TBD once we have some time to think about it. I don't think this is the most critical gap in the tool, but a nice to have.

            I've been looking into this and can submit a patch if this isn't already being worked on.
            One question is if this should have a printf directive?
            Right now if I wanted to print out the jobid of some files I would have to do

            lfs find /mnt/lustre --xattr user.job | xargs getfattr -m user.job -d 

            The right way to do this with a printf directive isn't obvious to me, for example, would we want to only dump the user namespace (since e.g. "lustre.lov" wouldn't be helpful to dump)? Would we want to dump every user namespace xattr with newlines separating them, or in some other format? Would we want to dump only xattrs that match the supplied --xattr and --xattr-match arguments?
            With these questions I think just using xargs getfattr is the right way to go since then we don't have to make decisions for the user on how to print xattrs. But I wanted to ask about adding printf support in case it's important/desirable.

            bertschinger Thomas Bertschinger added a comment - I've been looking into this and can submit a patch if this isn't already being worked on. One question is if this should have a printf directive? Right now if I wanted to print out the jobid of some files I would have to do lfs find /mnt/lustre --xattr user.job | xargs getfattr -m user.job -d The right way to do this with a printf directive isn't obvious to me, for example, would we want to only dump the user namespace (since e.g. "lustre.lov" wouldn't be helpful to dump)? Would we want to dump every user namespace xattr with newlines separating them, or in some other format? Would we want to dump only xattrs that match the supplied --xattr and --xattr-match arguments? With these questions I think just using xargs getfattr is the right way to go since then we don't have to make decisions for the user on how to print xattrs. But I wanted to ask about adding printf support in case it's important/desirable.

            People

              bertschinger Thomas Bertschinger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: