[LUDOC-523] add proper documentation for replace_nids command - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- LTS12

Rank (Obsolete):
9223372036854775807

Description

Can you please add a separate lctl-replace_nids.8 man page for the replace_nids command, and update the user manual in a similar manner. The current entry in the lctl.8 man page and manual entry are totally lacking in explanation of what the various NIDs mean:

       replace_nids <devicename> <nid1>[,nid2,nid3:nid4,nid5:nid6 ...]
              Replace the LNET Network Identifiers for a given device, as when
              the server's IP address has changed. This command must be run
              on the MGS node. Only MGS server should be started (command
              execution returns error in another cases). To start the MGS ser-
              vice only: mount -t lustre <MDT partition> -o nosvc <mount
              point> Note the replace_nids command skips any invalidated
              records in the configuration log. The previous log is backed up
              with the suffix '.bak'. Failover nids must be passed after ':'
              symbol. More then one failover can be set (every failover nids
              after ':' symbol).

This could be improved in several ways:

don't use "nid1,nid2,nid3,nid4" in the description, but rather names like original_nid and nodeA_nid1,nodeA_nid2:nodeB_nid1,nodeB_nid2 or similar.
it should have an explanation of the behaviour for failover NIDs, as was described in the patch https://review.whamcloud.com/30624 "LU-10384 mgs: replace_nids large string and failover support" commit message.

The choice of syntax for the replace_nids command is also very confusing, since the nid1,nid2,nid3:nid4,nid5 part could easily be mis-parsed if only the "new" NIDs are given as nid2,nid3:nid4,nid5. It would be better to use getopt_long() and provide named arguments like:

lctl replace_nids <target> --orig <original_nid> --new <nodeA_nid1>,<nodeA_nid2> --failnode <nodeB_nid1>,<nodeB_nid2>

or --servicenode or something similar. This would make it clear which NID is being replaced, and which NIDs are being added, and allows for other options to be added in the future.

Attachments

Issue Links

is duplicated by

LU-15418 I need clarification on using lctl replace_nids

Resolved

is related to

LU-10384 Replace nids doesn't add failover nid and add_conn string to config

Resolved

mentioned in: Page No Confluence page found with the given URL.

Activity

[LUDOC-523] add proper documentation for replace_nids command

Matt Rásó-Barnett made changes - 21/Dec/24 12:00 AM

Remote Link

New: This issue links to "Page (Whamcloud Community Wiki)" [ 40840 ]

Artem Blagodarenko made changes - 03/Dec/24 11:33 PM

Status

Original: Open [ 1 ]

New: In Progress [ 3 ]

Artem Blagodarenko made changes - 03/Dec/24 4:00 PM

Description

Original: Can you please add a separate {{lctl-replace_nids.8}} man page for the {{replace_nids}} command, and update the user manual in a similar manner. The current entry in the {{lctl.8}} man page and manual entry are totally lacking in explanation of what the various NIDs mean:
{noformat}
       replace_nids <devicename> <nid1>[,nid2,nid3:nid4,nid5:nid6 ...]
              Replace the LNET Network Identifiers for a given device, as when
              the server's IP address has changed. This command must be run
              on the MGS node. Only MGS server should be started (command
              execution returns error in another cases). To start the MGS ser-
              vice only: mount -t lustre <MDT partition> -o nosvc <mount
              point> Note the replace_nids command skips any invalidated
              records in the configuration log. The previous log is backed up
              with the suffix '.bak'. Failover nids must be passed after ':'
              symbol. More then one failover can be set (every failover nids
              after ':' symbol).
{noformat}
This could be improved in several ways:
- don't use "nid1,nid2,nid3,nid4" in the description, but rather names like {{original_nid}} and {{nodeA_nid1,nodeA_nid2:nodeB_nid1,nodeB_nid2}} or similar.
-- it should have an explanation of the behaviour for failover NIDs, as was described in the patch [https://review.whamcloud.com/30624] "{{{}~~LU-10384~~ mgs: replace_nids large string and failover support{}}}" commit message.

The choice of syntax for the {{replace_nids}} command is also *very* confusing, since the {{nid1,nid2,nid3:nid4,nid5}} part could easily be mis-parsed if only the "new" NIDs are given as {{{}nid2,nid3:nid4,nid5{}}}. It would be better to use {{getopt_long()}} and provide named arguments like:
{noformat}
lctl replace_nids <target> --orig <original_nid> --new <nodeA_nid1>,<nodeA_nid2> --failnode <nodeB_nid1>,<nodeB_nid2>
{noformat}
or {{--servicenode}} or something similar. This would make it clear which NID is being replaced, and which NIDs are being added, and allows for other options to be added in the future.

New: Can you please add a separate {{lctl-replace_nids.8}} man page for the {{replace_nids}} command, and update the user manual in a similar manner. The current entry in the {{lctl.8}} man page and manual entry are totally lacking in explanation of what the various NIDs mean:
{noformat}
       replace_nids <devicename> <nid1>[,nid2,nid3:nid4,nid5:nid6 ...]
              Replace the LNET Network Identifiers for a given device, as when
              the server's IP address has changed. This command must be run
              on the MGS node. Only MGS server should be started (command
              execution returns error in another cases). To start the MGS ser-
              vice only: mount -t lustre <MDT partition> -o nosvc <mount
              point> Note the replace_nids command skips any invalidated
              records in the configuration log. The previous log is backed up
              with the suffix '.bak'. Failover nids must be passed after ':'
              symbol. More then one failover can be set (every failover nids
              after ':' symbol).
{noformat}

This could be improved in several ways:
- don't use "nid1,nid2,nid3,nid4" in the description, but rather names like {{original_nid}} and {{nodeA_nid1,nodeA_nid2:nodeB_nid1,nodeB_nid2}} or similar.
- it should have an explanation of the behaviour for failover NIDs, as was described in the patch https://review.whamcloud.com/30624 "{{~~LU-10384~~ mgs: replace_nids large string and failover support}}" commit message.

The choice of syntax for the {{replace_nids}} command is also *very* confusing, since the {{nid1,nid2,nid3:nid4,nid5}} part could easily be mis-parsed if only the "new" NIDs are given as {{nid2,nid3:nid4,nid5}}. It would be better to use {{getopt_long()}} and provide named arguments like:
{noformat}
lctl replace_nids <target> --orig <original_nid> --new <nodeA_nid1>,<nodeA_nid2> --failnode <nodeB_nid1>,<nodeB_nid2>
{noformat}
or {{--servicenode}} or something similar. This would make it clear which NID is being replaced, and which NIDs are being added, and allows for other options to be added in the future.

Artem Blagodarenko made changes - 03/Dec/24 3:59 PM

Description

Original: Can you please add a separate {{lctl-replace_nids.8}} man page for the {{replace_nids}} command, and update the user manual in a similar manner. The current entry in the {{lctl.8}} man page and manual entry are totally lacking in explanation of what the various NIDs mean:
{noformat}
       replace_nids <devicename> <nid1>[,nid2,nid3:nid4,nid5:nid6 ...]
              Replace the LNET Network Identifiers for a given device, as when
              the server's IP address has changed. This command must be run
              on the MGS node. Only MGS server should be started (command
              execution returns error in another cases). To start the MGS ser-
              vice only: mount -t lustre <MDT partition> -o nosvc <mount
              point> Note the replace_nids command skips any invalidated
              records in the configuration log. The previous log is backed up
              with the suffix '.bak'. Failover nids must be passed after ':'
              symbol. More then one failover can be set (every failover nids
              after ':' symbol).
{noformat}

This could be improved in several ways:
- don't use "nid1,nid2,nid3,nid4" in the description, but rather names like {{original_nid}} and {{nodeA_nid1,nodeA_nid2:nodeB_nid1,nodeB_nid2}} or similar.
- it should have an explanation of the behaviour for failover NIDs, as was described in the patch https://review.whamcloud.com/30624 "{{~~LU-10384~~ mgs: replace_nids large string and failover support}}" commit message.

The choice of syntax for the {{replace_nids}} command is also *very* confusing, since the {{nid1,nid2,nid3:nid4,nid5}} part could easily be mis-parsed if only the "new" NIDs are given as {{nid2,nid3:nid4,nid5}}. It would be better to use {{getopt_long()}} and provide named arguments like:
{noformat}
lctl replace_nids <target> --orig <original_nid> --new <nodeA_nid1>,<nodeA_nid2> --failnode <nodeB_nid1>,<nodeB_nid2>
{noformat}
or {{--servicenode}} or something similar. This would make it clear which NID is being replaced, and which NIDs are being added, and allows for other options to be added in the future.

New: Can you please add a separate {{lctl-replace_nids.8}} man page for the {{replace_nids}} command, and update the user manual in a similar manner. The current entry in the {{lctl.8}} man page and manual entry are totally lacking in explanation of what the various NIDs mean:
{noformat}
       replace_nids <devicename> <nid1>[,nid2,nid3:nid4,nid5:nid6 ...]
              Replace the LNET Network Identifiers for a given device, as when
              the server's IP address has changed. This command must be run
              on the MGS node. Only MGS server should be started (command
              execution returns error in another cases). To start the MGS ser-
              vice only: mount -t lustre <MDT partition> -o nosvc <mount
              point> Note the replace_nids command skips any invalidated
              records in the configuration log. The previous log is backed up
              with the suffix '.bak'. Failover nids must be passed after ':'
              symbol. More then one failover can be set (every failover nids
              after ':' symbol).
{noformat}
This could be improved in several ways:
- don't use "nid1,nid2,nid3,nid4" in the description, but rather names like {{original_nid}} and {{nodeA_nid1,nodeA_nid2:nodeB_nid1,nodeB_nid2}} or similar.
-- it should have an explanation of the behaviour for failover NIDs, as was described in the patch [https://review.whamcloud.com/30624] "{{{}~~LU-10384~~ mgs: replace_nids large string and failover support{}}}" commit message.

The choice of syntax for the {{replace_nids}} command is also *very* confusing, since the {{nid1,nid2,nid3:nid4,nid5}} part could easily be mis-parsed if only the "new" NIDs are given as {{{}nid2,nid3:nid4,nid5{}}}. It would be better to use {{getopt_long()}} and provide named arguments like:
{noformat}
lctl replace_nids <target> --orig <original_nid> --new <nodeA_nid1>,<nodeA_nid2> --failnode <nodeB_nid1>,<nodeB_nid2>
{noformat}
or {{--servicenode}} or something similar. This would make it clear which NID is being replaced, and which NIDs are being added, and allows for other options to be added in the future.

Colin Faber made changes - 09/Jan/24 9:29 PM

Key	Original: LU-11846	New: LUDOC-523
Affects Version/s	Original: Lustre 2.12.0 [ 13495 ]
Affects Version/s	Original: Lustre 2.13.0 [ 14290 ]
Affects Version/s	Original: Lustre 2.10.6 [ 14291 ]
Project	Original: Lustre [ 10000 ]	New: Lustre Documentation [ 10070 ]

Colin Faber made changes - 04/Aug/22 9:48 PM

Assignee

Original: Artem Blagodarenko [ artem_blagodarenko ]

New: Artem Blagodarenko [ ablagodarenko ]

Andreas Dilger made changes - 07/Jan/22 4:18 AM

Link

New: This issue is duplicated by ~~LU-15418~~ [ ~~LU-15418~~ ]

Cory Spitz made changes - 06/Jan/22 7:18 PM

Fix Version/s

Original: Lustre 2.15.0 [ 14791 ]

Peter Jones made changes - 26/Nov/20 4:38 PM

Fix Version/s		New: Lustre 2.15.0 [ 14791 ]
Fix Version/s	Original: Lustre 2.14.0 [ 14490 ]

Peter Jones made changes - 10/Sep/19 1:43 PM

Fix Version/s		New: Lustre 2.14.0 [ 14490 ]
Fix Version/s	Original: Lustre 2.13.0 [ 14290 ]

People

Assignee:: Artem Blagodarenko

Reporter:: Andreas Dilger

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Jan/19 5:59 AM

Updated:: 21/Dec/24 12:00 AM