Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4020

HSM copytool event monitoring capabilities

Details

    • 10799

    Description

      This ticket is to track the work being done to add copytool event monitoring capabilities to liblustreapi. The end result will be that external monitoring agents are able to read an event stream out of a FIFO.

      Attachments

        Issue Links

          Activity

            [LU-4020] HSM copytool event monitoring capabilities
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            bogl Bob Glossman (Inactive) added a comment - backport to b2_5; http://review.whamcloud.com/9512

            Attached a small proof-of-concept for generating valid JSON with PyYAML. The main takeaway is that the JSON we are currently generating is indeed valid YAML. Given that the DLC patches that bring in libyaml have not yet landed on master, and won't land for 2.5.x, I propose that we move forward with the existing simple JSON generator, but plan to replace it with libyaml when that becomes available.

            mjmac Michael MacDonald (Inactive) added a comment - Attached a small proof-of-concept for generating valid JSON with PyYAML. The main takeaway is that the JSON we are currently generating is indeed valid YAML. Given that the DLC patches that bring in libyaml have not yet landed on master, and won't land for 2.5.x, I propose that we move forward with the existing simple JSON generator, but plan to replace it with libyaml when that becomes available.

            I have pushed a new patch-set #6 for http://review.whamcloud.com/7790. Where after re-base, I tried to answer to the multiple comments from previous patch-sets.

            Andreas, is the new liblustreapi_json.c, what you wanted ? I am not really aware of this licensing protocols and thus about their packaging needs … What about the specific data-structures definitions being used, do they need to be in a separate .h file too with the appropriate header?

            Jinshan, I did not remove the head-list structure llapi_json_item_list, because I find code more easy to read than without.

            bfaccini Bruno Faccini (Inactive) added a comment - I have pushed a new patch-set #6 for http://review.whamcloud.com/7790 . Where after re-base, I tried to answer to the multiple comments from previous patch-sets. Andreas, is the new liblustreapi_json.c, what you wanted ? I am not really aware of this licensing protocols and thus about their packaging needs … What about the specific data-structures definitions being used, do they need to be in a separate .h file too with the appropriate header? Jinshan, I did not remove the head-list structure llapi_json_item_list, because I find code more easy to read than without.

            RBH runs on some client. This client may not be connected to external storage, like is the agent, so will have difficulties to communicate with it. Also there is a single instance of RBH so we may have a scalability issue. The agent count is easy to increase so even highly verbose CT will scale.

            jcl jacques-charles lafoucriere added a comment - RBH runs on some client. This client may not be connected to external storage, like is the agent, so will have difficulties to communicate with it. Also there is a single instance of RBH so we may have a scalability issue. The agent count is easy to increase so even highly verbose CT will scale.

            I mean Robinhood. I believe it is away of the state of the filesystem. It seems this plan is to capture the profile it exports in some other layer of Lustre. Why not just use Robinhood (or some other Policy Agent) directly to resolve the state of the filesystem?

            keith Keith Mannthey (Inactive) added a comment - I mean Robinhood. I believe it is away of the state of the filesystem. It seems this plan is to capture the profile it exports in some other layer of Lustre. Why not just use Robinhood (or some other Policy Agent) directly to resolve the state of the filesystem?

            What do you mean by "policy agent"? The "coordinator" or the "policy engine" (ie RBH)? The initial idea was to provide a STD interface for external backend tools, as CT is running on STD Lustre client, implementing it in liblustreapi for the CT is the natural way.

            jcl jacques-charles lafoucriere added a comment - What do you mean by "policy agent"? The "coordinator" or the "policy engine" (ie RBH)? The initial idea was to provide a STD interface for external backend tools, as CT is running on STD Lustre client, implementing it in liblustreapi for the CT is the natural way.

            Why not just get info from the policy agent? It knows the state of the entire FS.

            keith Keith Mannthey (Inactive) added a comment - Why not just get info from the policy agent? It knows the state of the entire FS.

            I thought that it might be best to move discussion about this work from gerrit to this ticket. As I indicated in the commit message for the review I pushed, my intent was to prove the concept and get some feedback on the plan, so I am happy to see this conversation happening.

            I'll respond to general feedback on some of the higher-level topics here so that we can keep the discussion going:

            jhammond: I did consider a socket (unix, udp) based approach, but it seemed to add complexity to the implementation without really adding much benefit over the FIFO approach. My goal wasn't to make a completely reliable event stream – I was thinking more of making it best effort. If there is a reader to see the events, great. If not, life goes on and there's no negative impact on the copytool instance. I was careful to handle cases where the copytool started without a reader (works OK) or where the reader disappeared at various points (OK, in my testing).

            adilger: JSON is actually a subset of YAML. YAML parsers can read JSON just fine, though the reverse isn't true. I decided to use JSON because the event format doesn't need all of YAML's capabilities, and it's much easier to generate correct JSON. JSON is also easier to validate on the reader side because it's simple. It's very easy to detect partial writes of JSON-formatted events, for example.

            All that having been said, I'm not opposed to the idea of using pure YAML, especially if someone else is writing or linking in a YAML library. In my opinion, though, JSON is probably good enough for 99% of what we need as far as structured output goes.

            jcl: I will certainly add tests for the final implementation, but thank you for calling it out. My initial focus was to get some code working in order to test ideas and generate discussion before committing to a final design. As far as libraries go, I am not opposed to the idea of using a well-tested library to generate JSON and/or YAML – I just wasn't sure what the reception would be to adding external dependencies like that. There is a MIT-licensed library called Jansson that seems mature and well-maintained.

            I think that covers most of the high-level topics. There's been a lot of really great feedback on implementation details too, and I appreciate that. I will certainly incorporate those improvements into the code as I make progress.

            mjmac Michael MacDonald (Inactive) added a comment - I thought that it might be best to move discussion about this work from gerrit to this ticket. As I indicated in the commit message for the review I pushed, my intent was to prove the concept and get some feedback on the plan, so I am happy to see this conversation happening. I'll respond to general feedback on some of the higher-level topics here so that we can keep the discussion going: jhammond : I did consider a socket (unix, udp) based approach, but it seemed to add complexity to the implementation without really adding much benefit over the FIFO approach. My goal wasn't to make a completely reliable event stream – I was thinking more of making it best effort. If there is a reader to see the events, great. If not, life goes on and there's no negative impact on the copytool instance. I was careful to handle cases where the copytool started without a reader (works OK) or where the reader disappeared at various points (OK, in my testing). adilger : JSON is actually a subset of YAML . YAML parsers can read JSON just fine, though the reverse isn't true. I decided to use JSON because the event format doesn't need all of YAML's capabilities, and it's much easier to generate correct JSON. JSON is also easier to validate on the reader side because it's simple. It's very easy to detect partial writes of JSON-formatted events, for example. All that having been said, I'm not opposed to the idea of using pure YAML, especially if someone else is writing or linking in a YAML library. In my opinion, though, JSON is probably good enough for 99% of what we need as far as structured output goes. jcl : I will certainly add tests for the final implementation, but thank you for calling it out. My initial focus was to get some code working in order to test ideas and generate discussion before committing to a final design. As far as libraries go, I am not opposed to the idea of using a well-tested library to generate JSON and/or YAML – I just wasn't sure what the reception would be to adding external dependencies like that. There is a MIT-licensed library called Jansson that seems mature and well-maintained. I think that covers most of the high-level topics. There's been a lot of really great feedback on implementation details too, and I appreciate that. I will certainly incorporate those improvements into the code as I make progress.

            Hmm, yes. Updated the description, thanks.

            mjmac Michael MacDonald (Inactive) added a comment - Hmm, yes. Updated the description, thanks.

            People

              mjmac Michael MacDonald (Inactive)
              mjmac Michael MacDonald (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: