[LU-4020] HSM copytool event monitoring capabilities Created: 27/Sep/13 Updated: 20/Mar/15 Resolved: 05/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Michael MacDonald (Inactive) | Assignee: | Michael MacDonald (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 10799 | ||||||||||||
| Description |
|
This ticket is to track the work being done to add copytool event monitoring capabilities to liblustreapi. The end result will be that external monitoring agents are able to read an event stream out of a FIFO. |
| Comments |
| Comment by Michael MacDonald (Inactive) [ 27/Sep/13 ] |
|
Pushed a proof-of-concept for discussion: http://review.whamcloud.com/7790 To play with it, you'll need to create a fifo and then cat it before starting the copytool in a different shell. e.g. [root@europa ~]# mknod /var/spool/hsm_events p && cat /var/spool/hsm_events {"event_time": "2013-09-27 17:26:32 -0400", "event_type": "REGISTER", "archive": 0, "mount_point": "/mnt/jovian", "uuid": "ea370e22-b5ac-ab98-71bb-605d217071f7"} {"event_time": "2013-09-27 17:26:39 -0400", "event_type": "RESTORE_START", "total_bytes": 0, "lustre_path": "CentOS-6.4-x86_64-bin-DVD2.iso", "fid": "0x200000400:0x24:0x0"} {"event_time": "2013-09-27 17:26:39 -0400", "event_type": "RESTORE_RUNNING", "current_bytes": 0, "total_bytes": 1452388352, "lustre_path": "CentOS-6.4-x86_64-bin-DVD2.iso", "fid": "0x200000400:0x24:0x0"} {"event_time": "2013-09-27 17:27:09 -0400", "event_type": "RESTORE_RUNNING", "current_bytes": 681574400, "total_bytes": 1452388352, "lustre_path": "CentOS-6.4-x86_64-bin-DVD2.iso", "fid": "0x200000400:0x24:0x0"} {"event_time": "2013-09-27 17:27:09 -0400", "event_type": "RESTORE_CANCEL", "fid": "0x200000400:0xa3:0x0"} {"event_time": "2013-09-27 17:27:14 -0400", "event_type": "UNREGISTER", "archive": 0, "mount_point": "/mnt/jovian", "uuid": "ea370e22-b5ac-ab98-71bb-605d217071f7"} |
| Comment by Oleg Drokin [ 30/Sep/13 ] |
|
I guess you mean liblustreapi and not liblustre in the ticket description? |
| Comment by Michael MacDonald (Inactive) [ 30/Sep/13 ] |
|
Hmm, yes. Updated the description, thanks. |
| Comment by Michael MacDonald (Inactive) [ 11/Oct/13 ] |
|
I thought that it might be best to move discussion about this work from gerrit to this ticket. As I indicated in the commit message for the review I pushed, my intent was to prove the concept and get some feedback on the plan, so I am happy to see this conversation happening. I'll respond to general feedback on some of the higher-level topics here so that we can keep the discussion going: jhammond: I did consider a socket (unix, udp) based approach, but it seemed to add complexity to the implementation without really adding much benefit over the FIFO approach. My goal wasn't to make a completely reliable event stream – I was thinking more of making it best effort. If there is a reader to see the events, great. If not, life goes on and there's no negative impact on the copytool instance. I was careful to handle cases where the copytool started without a reader (works OK) or where the reader disappeared at various points (OK, in my testing). adilger: JSON is actually a subset of YAML. YAML parsers can read JSON just fine, though the reverse isn't true. I decided to use JSON because the event format doesn't need all of YAML's capabilities, and it's much easier to generate correct JSON. JSON is also easier to validate on the reader side because it's simple. It's very easy to detect partial writes of JSON-formatted events, for example. All that having been said, I'm not opposed to the idea of using pure YAML, especially if someone else is writing or linking in a YAML library. In my opinion, though, JSON is probably good enough for 99% of what we need as far as structured output goes. jcl: I will certainly add tests for the final implementation, but thank you for calling it out. My initial focus was to get some code working in order to test ideas and generate discussion before committing to a final design. As far as libraries go, I am not opposed to the idea of using a well-tested library to generate JSON and/or YAML – I just wasn't sure what the reception would be to adding external dependencies like that. There is a MIT-licensed library called Jansson that seems mature and well-maintained. I think that covers most of the high-level topics. There's been a lot of really great feedback on implementation details too, and I appreciate that. I will certainly incorporate those improvements into the code as I make progress. |
| Comment by Keith Mannthey (Inactive) [ 08/Nov/13 ] |
|
Why not just get info from the policy agent? It knows the state of the entire FS. |
| Comment by jacques-charles lafoucriere [ 08/Nov/13 ] |
|
What do you mean by "policy agent"? The "coordinator" or the "policy engine" (ie RBH)? The initial idea was to provide a STD interface for external backend tools, as CT is running on STD Lustre client, implementing it in liblustreapi for the CT is the natural way. |
| Comment by Keith Mannthey (Inactive) [ 08/Nov/13 ] |
|
I mean Robinhood. I believe it is away of the state of the filesystem. It seems this plan is to capture the profile it exports in some other layer of Lustre. Why not just use Robinhood (or some other Policy Agent) directly to resolve the state of the filesystem? |
| Comment by jacques-charles lafoucriere [ 11/Nov/13 ] |
|
RBH runs on some client. This client may not be connected to external storage, like is the agent, so will have difficulties to communicate with it. Also there is a single instance of RBH so we may have a scalability issue. The agent count is easy to increase so even highly verbose CT will scale. |
| Comment by Bruno Faccini (Inactive) [ 25/Feb/14 ] |
|
I have pushed a new patch-set #6 for http://review.whamcloud.com/7790. Where after re-base, I tried to answer to the multiple comments from previous patch-sets. Andreas, is the new liblustreapi_json.c, what you wanted ? I am not really aware of this licensing protocols and thus about their packaging needs … What about the specific data-structures definitions being used, do they need to be in a separate .h file too with the appropriate header? Jinshan, I did not remove the head-list structure llapi_json_item_list, because I find code more easy to read than without. |
| Comment by Michael MacDonald (Inactive) [ 26/Feb/14 ] |
|
Attached a small proof-of-concept for generating valid JSON with PyYAML. The main takeaway is that the JSON we are currently generating is indeed valid YAML. Given that the DLC patches that bring in libyaml have not yet landed on master, and won't land for 2.5.x, I propose that we move forward with the existing simple JSON generator, but plan to replace it with libyaml when that becomes available. |
| Comment by Bob Glossman (Inactive) [ 05/Mar/14 ] |
|
backport to b2_5; |
| Comment by Peter Jones [ 05/Mar/14 ] |
|
Landed for 2.6 |