Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13055

add ability for named Changelog consumers

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.8, Lustre 2.15.0
    • None
    • None
    • 9223372036854775807

    Description

      Currently Lustre Changelog consumers are always named e.g. "cl1" or "cl14". It would be useful to be able to declare the changelog username (e.g. "cl-rbh" or "cl-audit") so that it is clear who the Changelog users are, and to avoid duplicate changelog registrations. Otherwise, it can happen that the original Changelog user registration can be lost, and the application registers a new user, and the old user causes the Changelog records to accumulate.

      Attachments

        Issue Links

          Activity

            [LU-13055] add ability for named Changelog consumers

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43741/
            Subject: LU-13055 libcfs: allow comma-separated masks
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6b6fde1026311a28595ea43af56392ca6ad24d79

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43741/ Subject: LU-13055 libcfs: allow comma-separated masks Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6b6fde1026311a28595ea43af56392ca6ad24d79

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43892
            Subject: LU-13055 mdd: make current changelog mask writable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 11e30ad4770811df392870e09a92dda022574ee6

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43892 Subject: LU-13055 mdd: make current changelog mask writable Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 11e30ad4770811df392870e09a92dda022574ee6

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43741
            Subject: LU-13055 libcfs: allow comma-separated masks
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 17ed95152ff513dcfe4c1a8f36e66b8bfeeba895

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43741 Subject: LU-13055 libcfs: allow comma-separated masks Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 17ed95152ff513dcfe4c1a8f36e66b8bfeeba895

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43710
            Subject: LU-13055 mdd: don't assert on unknown changelog lrh_type
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 1259811054a0e41f7fe646e8ad05307d693549f7

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43710 Subject: LU-13055 mdd: don't assert on unknown changelog lrh_type Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 1259811054a0e41f7fe646e8ad05307d693549f7

            Patch is updated. For now name uniqueness is not provided, if you think it must be - let me know. The another patch for b2_12 branches is on its way to provide better compatibility during downgrade

            tappro Mikhail Pershin added a comment - Patch is updated. For now name uniqueness is not provided, if you think it must be - let me know. The another patch for b2_12 branches is on its way to provide better compatibility during downgrade

            well, I was asking mostly because having unique names means changelog scan for all registered names upon new user registration, while right now the only ID is increased. That would require patch modification.

            Another problem I have right now is backward compatibility. New name format like 'cl$ID-$NAME' means that our scripts and any other one on customer side will get changelog user name in that format from the new server and later call to 'lfs changelog_clear' would return error because of sscanf() format in chgl_write(). That is not problem to change it or 'lfs' to skip '-$NAME' addition and still use cl$ID part after all but that is client side changes which means old client is not compatible with a new server changelog names. I am not sure if a client wants to work with any name registered from other client, so probably that is not problem, but if it is problem then I have no good solution except compatibility patch to an older Lustre client

            tappro Mikhail Pershin added a comment - well, I was asking mostly because having unique names means changelog scan for all registered names upon new user registration, while right now the only ID is increased. That would require patch modification. Another problem I have right now is backward compatibility. New name format like 'cl$ID-$NAME' means that our scripts and any other one on customer side will get changelog user name in that format from the new server and later call to 'lfs changelog_clear' would return error because of sscanf() format in chgl_write(). That is not problem to change it or 'lfs' to skip '-$NAME' addition and still use cl$ID part after all but that is client side changes which means old client is not compatible with a new server changelog names. I am not sure if a client wants to work with any name registered from other client, so probably that is not problem, but if it is problem then I have no good solution except compatibility patch to an older Lustre client

            Mike, the goal is for a service to be able to determine if it already has a registered changelog user or not at startup. Otherwise, if there are 2 changelog users "cl2" and "cl7" there is no way for the user or administrator to know whether those are being used by a particular service (audit, HSM, etc.), if a new user needs to be registered because the old one was idle and removed, or if the users are for some service that is no longer running. Allowing the service name in the username makes it totally clear that "cl2-audit" is for the audit service, "cl7-lamigo" is for lamigo, etc.

            Allowing only a single named user for a given service makes sense, since otherwise it is again ambiguous if both are in use, or one of them is stale. Then, if the service starts up and doesn't know what its changelog user is, it could first scan for it, or just try to register with the name and be given the old user back. If some service really needs to register two changelog users on the same MDT, then it can pick two different names (e.g. "robinhood" and "rbh-tmp-scan" or whatever).

            adilger Andreas Dilger added a comment - Mike, the goal is for a service to be able to determine if it already has a registered changelog user or not at startup. Otherwise, if there are 2 changelog users " cl2 " and " cl7 " there is no way for the user or administrator to know whether those are being used by a particular service (audit, HSM, etc.), if a new user needs to be registered because the old one was idle and removed, or if the users are for some service that is no longer running. Allowing the service name in the username makes it totally clear that " cl2-audit " is for the audit service, " cl7-lamigo " is for lamigo, etc. Allowing only a single named user for a given service makes sense, since otherwise it is again ambiguous if both are in use, or one of them is stale. Then, if the service starts up and doesn't know what its changelog user is, it could first scan for it, or just try to register with the name and be given the old user back. If some service really needs to register two changelog users on the same MDT, then it can pick two different names (e.g. " robinhood " and " rbh-tmp-scan " or whatever).

            John, as for the same names:

            we should ensure that $NAME is valid according to the same rules as above and that it is unique on the MDT. So if cl42-lamigo is already registered then we cannot register a new changelog with name lamigo on the same MDT.

            if we are using cl$ID-$name format then 'name' could be the same because ID is always unique, and even with same name these are different changelog users, so what is the point to require name uniqueness? 

            tappro Mikhail Pershin added a comment - John, as for the same names: we should ensure that $NAME is valid according to the same rules as above and that it is unique on the MDT. So if cl42-lamigo is already registered then we cannot register a new changelog with name lamigo on the same MDT. if we are using cl$ID-$name format then 'name' could be the same because ID is always unique, and even with same name these are different changelog users, so what is the point to require name uniqueness? 
            jhammond John Hammond added a comment -

            I think we should require a cl prefix on named changelogs. This will be less likely to break scripts that try to parse the changelog_users file.

            > there are couple questions related to proposed functionality. First, what name format will we allow? For example, name 'user1' would cause combined result as 'user11' or similar which is undistinguished from 'user' with ID 11. So either we prohibit digits in name or need separator like '.' between them, so that would be 'user1.1' and 'user.11'

            I had imagined that we would use something like cl-$NAME. For example cl-lamigo. The - makes splitting between numbered and named changelogs easier. To make it totally unambiguous we should require that $NAME starts with a letter and contains only letters, digits, and dashes.

            Alternatively we could allow specifying $NAME when the changelog is registered. But the the actual changelog created is named cl$NUM-$NAME where $NUM is allocated sequentially as before. In this case we should ensure that $NAME is valid according to the same rules as above and that it is unique on the MDT. So if cl42-lamigo is already registered then we cannot register a new changelog with name lamigo on the same MDT.

            > Another question is about deregister, right now tool demands it in form 'cl<ID>' and extract just ID to proceed with.

            I think it's much better if we can support a deregister command that uses a name. But if we use the cl$NUM-$NAME scheme and cl42-lamigo is registered then I think changelog_deregister cl42 should deregister cl42-lamigo.

            > So either we prohibit digits in name or need separator like '.' between them, so that would be 'user1.1' and 'user.11'

            I don't like dots. Please don't use them or allow them here. They'll be terrible if we ever want to use the changelog name in a param.

            jhammond John Hammond added a comment - I think we should require a cl prefix on named changelogs. This will be less likely to break scripts that try to parse the changelog_users file. > there are couple questions related to proposed functionality. First, what name format will we allow? For example, name 'user1' would cause combined result as 'user11' or similar which is undistinguished from 'user' with ID 11. So either we prohibit digits in name or need separator like '.' between them, so that would be 'user1.1' and 'user.11' I had imagined that we would use something like cl-$NAME . For example cl-lamigo . The - makes splitting between numbered and named changelogs easier. To make it totally unambiguous we should require that $NAME starts with a letter and contains only letters, digits, and dashes. Alternatively we could allow specifying $NAME when the changelog is registered. But the the actual changelog created is named cl$NUM-$NAME where $NUM is allocated sequentially as before. In this case we should ensure that $NAME is valid according to the same rules as above and that it is unique on the MDT. So if cl42-lamigo is already registered then we cannot register a new changelog with name lamigo on the same MDT. > Another question is about deregister, right now tool demands it in form 'cl<ID>' and extract just ID to proceed with. I think it's much better if we can support a deregister command that uses a name. But if we use the cl$NUM-$NAME scheme and cl42-lamigo is registered then I think changelog_deregister cl42 should deregister cl42-lamigo . > So either we prohibit digits in name or need separator like '.' between them, so that would be 'user1.1' and 'user.11' I don't like dots. Please don't use them or allow them here. They'll be terrible if we ever want to use the changelog name in a param.

            there are couple questions related to proposed functionality. First, what name format will we allow? For example, name 'user1' would cause combined result as 'user11' or similar which is undistinguished from 'user' with ID 11. So either we prohibit digits in name or need separator like '.' between them, so that would be 'user1.1' and 'user.11' 

            Another question is about deregister, right now tool demands it in form 'cl<ID>' and extract just ID to proceed with. While keeping old 'cl<ID>' format for compatibility is OK, wouldn't that be useful to support pure ID form: 'deregister ID' and/or name+ID form: 'deregister 'name<ID>'. In latter case we also have problem #1 with name and ID proper separation

            tappro Mikhail Pershin added a comment - there are couple questions related to proposed functionality. First, what name format will we allow? For example, name 'user1' would cause combined result as 'user11' or similar which is undistinguished from 'user' with ID 11. So either we prohibit digits in name or need separator like '.' between them, so that would be 'user1.1' and 'user.11'  Another question is about deregister, right now tool demands it in form 'cl<ID>' and extract just ID to proceed with. While keeping old 'cl<ID>' format for compatibility is OK, wouldn't that be useful to support pure ID form: 'deregister ID' and/or name+ID form: 'deregister 'name<ID>'. In latter case we also have problem #1 with name and ID proper separation

            Mike, the attached patch was my initial attempt at creating named changelog users with a configurable mask per user (LU-13338).

            It has a reasonable start of how to configure the named users. They are currently of the form <name><number>, in order to ensure uniqueness, but John should chime in whether it would be better to use only <name> and refuse to create multiple users with the same <name>. If no <name> is specified, then using "cl" would be best.

            There is also some work for per-user masks, to include a field in the config record to store the mask and check it at startup time. The idea is that the currently-set mask would be the union of the masks specified by all of the registered users, also including the global mask (if set). For compatibility reasons, users registered without a mask would need to enable the full default mask, so it would be desirable for most users to delete the old user and create a new limited-mask user.

            The mask in the config record is fine to specify the mask at creation time, but it would be somewhat tricky to change the mask for a specific user afterward. For most users that should be OK, since they would only be processing records that they originally registered for. If necessary, they could drain their pending log records, delete their user ID, and re-register their user with a different mask. If it wasn't possible to drain the log, setting the global mask could be used to temporarily enable additional records as needed until that could be done, so I don't think that is critical to implement in the first pass.

            adilger Andreas Dilger added a comment - Mike, the attached patch was my initial attempt at creating named changelog users with a configurable mask per user ( LU-13338 ). It has a reasonable start of how to configure the named users. They are currently of the form <name><number> , in order to ensure uniqueness, but John should chime in whether it would be better to use only <name> and refuse to create multiple users with the same <name> . If no <name> is specified, then using " cl " would be best. There is also some work for per-user masks, to include a field in the config record to store the mask and check it at startup time. The idea is that the currently-set mask would be the union of the masks specified by all of the registered users, also including the global mask (if set). For compatibility reasons, users registered without a mask would need to enable the full default mask, so it would be desirable for most users to delete the old user and create a new limited-mask user. The mask in the config record is fine to specify the mask at creation time, but it would be somewhat tricky to change the mask for a specific user afterward. For most users that should be OK, since they would only be processing records that they originally registered for. If necessary, they could drain their pending log records, delete their user ID, and re-register their user with a different mask. If it wasn't possible to drain the log, setting the global mask could be used to temporarily enable additional records as needed until that could be done, so I don't think that is critical to implement in the first pass.

            People

              tappro Mikhail Pershin
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: