[LU-16557] don't skip add_conn with -o network mount option Created: 15/Feb/23  Updated: 11/Apr/23  Resolved: 01/Mar/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.3

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11057 Client mount option "-o network=net" ... Resolved
is related to LU-7845 Support namespace in credentials retr... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Mount option -o network is used to restrict client network to access servers. It filters networks during 'setup'  configure command and skips all 'add_conn' commands if connection UUID has no correct network mention. Meawhile that UUID in add_conn is just a name, and real NIDs attached to it may be on correct network. Skipping such add_conn skips also possible failover NIDs and leave client without knowledge about failover nodes.

E.g. in configuration below:

 - { index: 68, event: add_uuid, nid: 10.160.44.6@tcp(0x200000aa02c06), node: 10.160.44.6@tcp }
- { index: 69, event: add_uuid, nid: 10.160.44.38@tcp1(0x200010aa02c26), node: 10.160.44.6@tcp }
- { index: 85, event: attach, device: lustre-MDT0002-mdc, type: mdc, UUID: lus27-clilmv_UUID }
- { index: 86, event: setup, device: lustre-MDT0002-mdc, UUID: lustre-MDT0002_UUID, node: 10.160.44.6@tcp }

### here after setup both @tcp and @tcp1 NIDs are filtered and the latter is kept, note that connection UUID used in 'setup' has "@tcp" network and that is not taken into account properly

### here below result of --servicenode option, first there are nids and finally add_conn with then:

- { index: 87, event: add_uuid, nid: 10.160.44.6@tcp(0x200000aa02c06), node: 10.160.44.6@tcp }
- { index: 88, event: add_uuid, nid: 10.160.44.38@tcp1(0x200010aa02c26), node: 10.160.44.6@tcp }
- { index: 103, event: add_conn, device: lustre-MDT0002-mdc, node: 10.160.44.6@tcp }

### second node
- { index: 104, event: add_uuid, nid: 10.160.44.7@tcp(0x200000aa02c07), node: 10.160.44.7@tcp }
- { index: 105, event: add_uuid, nid: 10.160.44.39@tcp1(0x200010aa02c27), node: 10.160.44.7@tcp }
- { index: 120, event: add_conn, device: lustre-MDT0002-mdc, node: 10.160.44.7@tcp }

Each 'add_conn' was configured with NID on restricted network "@tcp1" but 'add_conn' is skipped because it has no mention of "tcp1" in own name. Therefore client mounted without second node at address 10.160.44.39@tcp1 and can't connect to server during failover.



 Comments   
Comment by Mikhail Pershin [ 15/Feb/23 ]

https://review.whamcloud.com/#/c/fs/lustre-release/+/49986/ - proposed patch saves restricted network info in import during 'setup' command processing, so it is possible to apply restriction each time when import_set_conn() is called. Therefore it is applied on 'add_conn' in the same manner and I assume that will allow to mount with -o network and Dynamic Discovery LNet enabled, since it is using the same code to add connections

Comment by Gerrit Updater [ 01/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49986/
Subject: LU-16557 client: -o network needs add_conn processing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c508c9426838f16256223ab0bbd648bfbec25e46

Comment by Peter Jones [ 01/Mar/23 ]

Landed for 2.16

Comment by Gerrit Updater [ 02/Mar/23 ]

"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50187
Subject: LU-16557 client: -o network needs add_conn processing
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 1f3f62eb8b74e6c29c8b9b62bcfc1884855a15a8

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50187/
Subject: LU-16557 client: -o network needs add_conn processing
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 0543381b2f0ea6e2980315765ad34ae37411d36a

Generated at Sat Feb 10 03:28:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.