[LU-14608] Adding second network to filesystem Created: 12/Apr/21  Updated: 13/Apr/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.5
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10360 use Imperative Recovery logs for clie... Open
Rank (Obsolete): 9223372036854775807

 Description   

We have filesystem using ko2ib NIDs. We want to add the option for tcp NIDs. If we add the tcp NID dynamic to the servers, do we need to add the new nids via tunefs.lustre to the msg config.? Can we add the second nids to the MGS configs with taking the filesystem down?



 Comments   
Comment by Andreas Dilger [ 12/Apr/21 ]

mhanafi do you mean to ask if NIDs can be added without taking the filesystem down?

Comment by Mahmoud Hanafi [ 12/Apr/21 ]

Normally changing nids/adding nids requires umonting target and using writeconf to make the change. We just want to test the new nids and don't want to take down/umount the servers.

Comment by Amir Shehata (Inactive) [ 13/Apr/21 ]

You should be able to dynamically add the NIDs to both the clients and the servers.
Adding the NIDs will trigger a discovery round (if you have discovery on).
After discovery is complete, the clients and the servers will know they can reach the other via the tcp NIDs as well.
These tcp NIDs should start being used for FS requests, without having to add them explicitly via tunefs. This will happen due to MR capability.
When selecting the NIDs, LNet looks at the credits available. Normally because o2ib is a faster interconnect, the number of credits available on the o2ib interfaces will be higher than the tcp and o2iblnd will be used more often.

Comment by Mahmoud Hanafi [ 13/Apr/21 ]

Here I add the tcp nid to the servers.

nbp1-mds ~ # lctl list_nids
10.151.26.117@o2ib
10.151.26.117@tcp

nbp1-mds ~ # lnetctl global show
global:
    numa_range: 0
    max_intf: 200
    discovery: 1
    drop_asym_route: 0
    retry_count: 0
    transaction_timeout: 200
    health_sensitivity: 0
    recovery_interval: 1

and on the client

r803i8n1 ~ # lctl list_nids
10.151.40.131@tcp

r803i8n1 ~ # lctl ping 10.151.26.117@tcp
12345-0@lo
12345-10.151.26.117@o2ib
12345-10.151.26.117@tcp

r803i8n1 ~ # lnetctl global show
global:
    numa_range: 0
    max_intf: 200
    discovery: 1
    drop_asym_route: 0
    retry_count: 0
    transaction_timeout: 200
    health_sensitivity: 0
    recovery_interval: 1

Mounting fails

r803i8n1 ~ # mount -t lustre 10.151.26.117@tcp:/nbp1 /nobackupp1
mount.lustre: mount 10.151.26.117@tcp:/nbp1 at /nobackupp1 failed: No such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)
Comment by Amir Shehata (Inactive) [ 13/Apr/21 ]

is r803i8n1 a new client? If so, the method I described above wouldn't work. I got the impression that you wanted existing clients which have the FS mounted to use the tcp interface. The method I described would work in that case. But for a brand new client it wouldn't. There is a patch in the pipeline which is meant to address this use case. Basically, adding interfaces dynamically and then allowing new clients to use them for mounting.

https://review.whamcloud.com/#/c/40736/

Generated at Sat Feb 10 03:11:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.