[LU-12821] Replace the posix copytool with robinhood's generic copytool Created: 30/Sep/19  Updated: 05/Aug/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: CEA Assignee: Dominique Martinet (Inactive)
Resolution: Unresolved Votes: 0
Labels: HSM

Rank (Obsolete): 9223372036854775807

 Description   

As discussed on LAD (with Andreas/Nathan), replacing the posix copytool by robinhood's generic copytool might be interesting - the robinhood isn't a good place for it to start with, and it looks like the posix copytool hasn't had much love.

 

Robinhood's generic copytool can be found here:

https://github.com/cea-hpc/robinhood/blob/master/src/tools/lhsmtool_cmd.c

 

Unfortunately it makes extensive use of glib, I'd rather not pull that for lustre tools, so we're talking about a full rewrite... I had forgotten that point last week

  • the most problematic might be the g_shell_parse_argv function, that could probably be replaced by wordexp(3) though; even if I'm not too keen of how overpowerful it is.
  • the regexp replaces for {fd} etc are simple replaces we could do manually with strstr and friends, but if we go with wordexp we we could just say the syntax changed and define some $FD $FID and $CTDATA variables for wordexp to expand by itself - as long as people quote variables properly it does a good job at expanding as one would expect.
  • the ini config file parsing... Could switch to yaml or something else we already have some of, I guess.
  • g_thread/async queue: I'd say single thread depopping events and forking as required is going to be just as efficient, and allow much simpler code... Do hsm-related actions (getting fid, ctdata etc) before fork to be safe, and slower stuff in child process (open & execve basically), and ct_fini upon receiving a sigchld (some eventfd for this, and polling on the kuc->lk_rfd and that, perhaps? would rather not do more in signal handler, even if the single thread model assumes ct_fini will be fast...)

All in all that seems to be a much bigger work than I thought it'd be last week, but it's not that much code either (<1kloc) so we might be able to make it happen somehow... I'd still say new code would be faster than trying to adapt the current posix copytool so I wouldn't go that way, but happy to take opinions here, and this LU can be used as reminder for myself



 Comments   
Comment by Peter Jones [ 30/Sep/19 ]

Dominique

Do I understand correctly that you intend to work on this?

Peter

Comment by Dominique Martinet (Inactive) [ 30/Sep/19 ]

Hi Peter,

I don't think anyone else will, so yes, unless you have a keen interest and resources for it then by all means please do!
I've just assigned the LU to myself. Comments on design before I start would be welcome though.

I'm actually on PTO for the next few weeks and this won't be a priority ever but as I said it shouldn't be too bad to do so it will happen eventually

Comment by Peter Jones [ 30/Sep/19 ]

I think that you getting back to this after your PTO is definitely the fastest path to it getting attention

Comment by Ben Evans (Inactive) [ 18/Oct/19 ]

Would this be bringing the Robinhood copytool into the Lustre codebase?

Comment by Andreas Dilger [ 21/Oct/19 ]

Ben, that is my thought, yes. That would allow us to get rid of the old lhsm_posix copytool, and instead use the generic interface that allows using a variety of tools to do the data copy to different backends, instead of having to write a dedicated backend for each target type.

Note that I haven't looked into the code in detail, but this is my understanding at least.

Comment by Andreas Dilger [ 22/Oct/19 ]

Dominique, is there a man page or other documentation for the lhsmtool_cmd tool? Looking briefly at the source, it seems that the only options for the command-line tool are to use "{fid}" and "{fd}".

Comment by Dominique Martinet (Inactive) [ 25/Oct/19 ]

Andreas, there is a man page - you can find it in .rst format here https://github.com/cea-hpc/robinhood/blob/master/man/lhsmtool_cmd.rst (the man page is in the git tree as well)

As far as I understand it would also replaces

{ctdata}

, but I am not sure how it is used and the man page does not speak about it.
Either way since the internals will have to be reworked we can adjust the options a bit as well, if you have any comment now is a good time

Generated at Sat Feb 10 02:55:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.