[LU-16557] don't skip add_conn with -o network mount option Created: 15/Feb/23 Updated: 11/Apr/23 Resolved: 01/Mar/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Mikhail Pershin | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Mount option -o network is used to restrict client network to access servers. It filters networks during 'setup' configure command and skips all 'add_conn' commands if connection UUID has no correct network mention. Meawhile that UUID in add_conn is just a name, and real NIDs attached to it may be on correct network. Skipping such add_conn skips also possible failover NIDs and leave client without knowledge about failover nodes. E.g. in configuration below: - { index: 68, event: add_uuid, nid: 10.160.44.6@tcp(0x200000aa02c06), node: 10.160.44.6@tcp }
- { index: 69, event: add_uuid, nid: 10.160.44.38@tcp1(0x200010aa02c26), node: 10.160.44.6@tcp }
- { index: 85, event: attach, device: lustre-MDT0002-mdc, type: mdc, UUID: lus27-clilmv_UUID }
- { index: 86, event: setup, device: lustre-MDT0002-mdc, UUID: lustre-MDT0002_UUID, node: 10.160.44.6@tcp }
### here after setup both @tcp and @tcp1 NIDs are filtered and the latter is kept, note that connection UUID used in 'setup' has "@tcp" network and that is not taken into account properly
### here below result of --servicenode option, first there are nids and finally add_conn with then:
- { index: 87, event: add_uuid, nid: 10.160.44.6@tcp(0x200000aa02c06), node: 10.160.44.6@tcp }
- { index: 88, event: add_uuid, nid: 10.160.44.38@tcp1(0x200010aa02c26), node: 10.160.44.6@tcp }
- { index: 103, event: add_conn, device: lustre-MDT0002-mdc, node: 10.160.44.6@tcp }
### second node
- { index: 104, event: add_uuid, nid: 10.160.44.7@tcp(0x200000aa02c07), node: 10.160.44.7@tcp }
- { index: 105, event: add_uuid, nid: 10.160.44.39@tcp1(0x200010aa02c27), node: 10.160.44.7@tcp }
- { index: 120, event: add_conn, device: lustre-MDT0002-mdc, node: 10.160.44.7@tcp }
Each 'add_conn' was configured with NID on restricted network "@tcp1" but 'add_conn' is skipped because it has no mention of "tcp1" in own name. Therefore client mounted without second node at address 10.160.44.39@tcp1 and can't connect to server during failover. |
| Comments |
| Comment by Mikhail Pershin [ 15/Feb/23 ] |
|
https://review.whamcloud.com/#/c/fs/lustre-release/+/49986/ - proposed patch saves restricted network info in import during 'setup' command processing, so it is possible to apply restriction each time when import_set_conn() is called. Therefore it is applied on 'add_conn' in the same manner and I assume that will allow to mount with -o network and Dynamic Discovery LNet enabled, since it is using the same code to add connections |
| Comment by Gerrit Updater [ 01/Mar/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49986/ |
| Comment by Peter Jones [ 01/Mar/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 02/Mar/23 ] |
|
"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50187 |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50187/ |