[LUDOC-523] add proper documentation for replace_nids command Created: 10/Jan/19  Updated: 09/Jan/24

Status: Open
Project: Lustre Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Artem Blagodarenko
Resolution: Unresolved Votes: 1
Labels: LTS12

Issue Links:
Duplicate
is duplicated by LU-15418 I need clarification on using lctl re... Open
Related
is related to LU-10384 Replace nids doesn't add failover nid... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Can you please add a separate lctl-replace_nids.8 man page for the replace_nids command, and update the user manual in a similar manner. The current entry in the lctl.8 man page and manual entry are totally lacking in explanation of what the various NIDs mean:

       replace_nids <devicename> <nid1>[,nid2,nid3:nid4,nid5:nid6 ...]
              Replace the LNET Network Identifiers for a given device, as when
              the server's IP address has changed.  This command must  be  run
              on  the  MGS  node.   Only MGS server should be started (command
              execution returns error in another cases). To start the MGS ser-
              vice  only:  mount  -t  lustre  <MDT  partition> -o nosvc <mount
              point> Note  the  replace_nids  command  skips  any  invalidated
              records in the configuration log.  The previous log is backed up
              with the suffix '.bak'.  Failover nids must be passed after  ':'
              symbol.  More  then one failover can be set (every failover nids
              after ':' symbol).

This could be improved in several ways:

  • don't use "nid1,nid2,nid3,nid4" in the description, but rather names like original_nid and nodeA_nid1,nodeA_nid2:nodeB_nid1,nodeB_nid2 or similar.
  • it should have an explanation of the behaviour for failover NIDs, as was described in the patch https://review.whamcloud.com/30624 "LU-10384 mgs: replace_nids large string and failover support" commit message.

The choice of syntax for the replace_nids command is also very confusing, since the nid1,nid2,nid3:nid4,nid5 part could easily be mis-parsed if only the "new" NIDs are given as nid2,nid3:nid4,nid5. It would be better to use getopt_long() and provide named arguments like:

lctl replace_nids <target> --orig <original_nid> --new <nodeA_nid1>,<nodeA_nid2> --failnode <nodeB_nid1>,<nodeB_nid2>

or --servicenode or something similar. This would make it clear which NID is being replaced, and which NIDs are being added, and allows for other options to be added in the future.



 Comments   
Comment by Peter Jones [ 12/Jan/19 ]

Adding Cory for visibility

Generated at Sat Feb 10 03:43:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.