[LU-12236] Support more than the default root network namespace Created: 27/Apr/19  Updated: 05/Jun/20  Resolved: 07/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.5

Type: Improvement Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: Aurelien Degremont (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-11385 client hit BUG: unable to handle kern... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Linux supports network namespaces. These namespaces creates different network views for different process groups. Each network namespace has its own set of network devices, IP addresses, routing table and TCP stack, including firewalling rules.

Network devices and TCP sockets are attached to a specific network namespace and are visible and usable only through it. Since network namespace feature was added to Linux, LNET has been using the default root namespace every time it needs a network namespace reference.

Container is a technical solution relying on cgroups and namespaces, including network namespaces. Trying to use Lustre in a container means trying to use Lustre in a specific network namespace. If this network namespace does have a dedicated network card to access the Lustre filesystem, it will not be possible for LNET to use and mount the Lustre filesystem. This feature intends to enable using more than the root namespace for Lustre communication.

Proposal:

Based on code audit, LNET uses this hardcoded root network namespace in 3 use cases. When:

  • Enumerating all network devices, when configuring the LNET layer
  • Listening to the LNET default socket for connection (by default: 0.0.0.0:988), in acceptor thread
  • Creating a socket to connect to other LNET peers

As of Lustre 2.10, LNET records the current process network namespace when it setups a network interface definition internally.
The main idea to implement this feature is to rely on the current process network namespace instead of using the root namespace. This namespace is easily accessible in current->nsproxy->net_ns. Every time the current process could be easily accessed, this namespace will be used. The call chain is updated to pass this value down to the call that needs this value. This covers LNET setup, usually done using lnetctl, lctl or automatically when loading module with insmod or modprobe.
There are 2 cases where network accesses are made by Lustre service kernel threads. Service threads are always started in the default root namespace. We cannot rely on the current thread namespace for them:

  • When a connection is received, this connection is received on a specific network interface usage. We know which network namespace is associated to it.
  • When opening a socket to listen for incoming connections, Lustre does not enforce any specific network interface. Ideally we would like to accept connection from all LNET configured interfaces. However this requires more code changes. Instead of creating the socket into the root namespace, we will use the current process namespace that created the network listening thread. This keeps the code change limited and still able to use any network namespace on the system instead of the root one.

 

This is a initial design proposal.



 Comments   
Comment by Gerrit Updater [ 27/Apr/19 ]

Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/34768
Subject: LU-12236 lnet: support non-default network namespace
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2342994c6f271ee6657322a015b4da440efbd318

Comment by Aurelien Degremont (Inactive) [ 27/Apr/19 ]

This patch is a first draft for review and comments

Comment by Aurelien Degremont (Inactive) [ 02/May/19 ]

One more patch that remove dead code, needed prior to implement GSS support for network namespace.

https://review.whamcloud.com/#/c/34794/

Comment by Sebastien Buisson [ 09/May/19 ]

Thanks for the patches Aurélien.

Could you please document here how you are running Lustre in a container? Are you able to share a script that illustrates a use case for which your patch is helpful (ie with the patch it works, without it fails)?

I tried your patch https://review.whamcloud.com/34768 with a privileged Docker container, without sharing the host's network namespace (so the container has its own network namespace).

host# docker run -ti --privileged --name test_1 lustre_cli_1:latest /bin/bash

I used pipework to create a network interface specific to the container

host# pipework br1 test_1 10.128.11.189/21

And then, after installing the lustre package from within the container I tried 'lnetctl net add' on that interface.

container# rpm -ivh /tmp/lustre-2.12.52_97_g2342994-1.el7.x86_64.rpm
container# lnetctl lnet configure ; lnetctl net add --net tcp1 --if eth1

Unfortunately, it failed with the following errors:

[4426281.034106] LNetError: 23358:0:(lib-socket.c:129:lnet_ipif_query()) Can't get flags for interface eth1
[4426281.034599] LNetError: 23358:0:(socklnd.c:2850:ksocknal_startup()) Can't get interface eth1 info: -19
[4426282.035079] LNetError: 105-4: Error -100 starting up LNI tcp

It is probably due to the fact that the interface is looked up in the host's net namespace, not the container's.
I was expecting your patch to address this scenario, am I getting it wrong?

Thanks,
Sebastien.

Comment by Aurelien Degremont (Inactive) [ 09/May/19 ]

It is probably due to the fact that the interface is looked up in the host's net namespace, not the container's.

Exactly! And this is the use case I'm addressing.

You need to dedicate the interface to a network namespace and run Lustre setup inside that network namespace, like:

host# ip netns add blue
host# ip link set eth1 netns blue
host# ip netns exec blue bash
netns# ifconfig eth1 ....
netns# modprobe ksocklnd
netns# lnetctl lnet configure
netns# lnetctl net add --net tcp1 --if eth1 

 

Comment by Sebastien Buisson [ 10/May/19 ]

pipework is a very popular and convenient way to manipulate network namespaces and containers.
https://github.com/jpetazzo/pipework

Here is a rough list of actions carried out by pipework when running the 'pipework br1 test_1 10.128.11.189/21' command to add an eth1 interface inside the container named test_1 (whose pid is 25578), with network address 10.128.11.189/21 (br1 being the bride interface created on the host):

rm -f /var/run/netns/25578
ln -s /proc/25578/ns/net /var/run/netns/25578
ip link add name veth1pl25578 mtu 1500 type veth peer name veth1pg25578 mtu 1500
ip link set veth1pl25578 master br1
ip link set veth1pl25578 up
ip link set veth1pg25578 netns 25578
ip netns exec 25578 ip link set veth1pg25578 name eth1
ip netns exec 25578 ip addr add 10.128.11.189/21 brd 10.128.15.255 dev eth1
ip netns exec 25578 ip link set eth1 up

So this is somehow similar to what you are doing by hand. However, your patch fails to make it work, with the errors given above.

I think the difference is that, in the case of pipework, it creates a new interface, specifically for the container in the container namespace. In your case, you are just moving to a different namespace an interface that exists on the host (making it exclusive to this namespace, and preventing it to be used in any other namespace and even on the host). So it might hide some aspects, in particular the fact that some actions carried out by lnetctl may not be done in the expected namespace.
So you should try with a network interface directly created in your 'blue' network namespace, and see how it goes for your patch.

By the way, you are always in the host's mount namespace in your tests, because as far as I can see you are only manipulating the network namespace. This is very different when running in actual containers (whether it be Docker containers or just basic lxc containers), and usually you do not have access to Lustre kernel modules for instance, as /lib/modules is not exported to the container.

 

Comment by Sebastien Buisson [ 14/May/19 ]

For the sake of clarity, the test scenario described in the comment https://jira.whamcloud.com/browse/LU-12236?focusedCommentId=246900&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-246900 above fails both with and without patch at https://review.whamcloud.com/34768.

So this is not a regression from this patch, but this use case should be addressed by the patch.

Comment by Peter Jones [ 16/May/19 ]

Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/34794
Subject: LU-12236 gss: remove unused code in gss_svc_upcall.c
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f941b00d3a7379c786d3fbb5b2c35d86d3829c87

Comment by Aurelien Degremont (Inactive) [ 22/May/19 ]

Sébastien, I tried something very similar to the command list you posted (create a pair of veth devs, put one in container with an ip, and use it in lnetctl) and it works for me. I was able to configure the virtual interface.

I did not tried with pipework and docker itself as I do not have that handy.

I will use this command list as baseline to try to make a test for this feature for Maloo.

Comment by Gerrit Updater [ 25/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34794/
Subject: LU-12236 gss: remove unused code in gss_svc_upcall.c
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 25b0bf5a23032394055b7b94b3169f5cf4068570

Comment by Peter Jones [ 25/May/19 ]

Landed for 2.13

Comment by Aurelien Degremont (Inactive) [ 27/May/19 ]

This first patch was only a cleaning patch before the real ones. There are at least 2 other ones coming (first one being https://review.whamcloud.com/34768).

Comment by Sebastien Buisson [ 11/Jun/19 ]

Good news, with patchset 5 of https://review.whamcloud.com/34768 , I am able to successfully run the scenario described in the comment https://jira.whamcloud.com/browse/LU-12236?focusedCommentId=246900&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-246900 .

So using a network interface for LNet from within a Docker container works with this patch.

Comment by Aurelien Degremont (Inactive) [ 12/Jun/19 ]

That's good news! I will address your comments in Gerrit.

 

Comment by Gerrit Updater [ 01/Aug/19 ]

Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/35666
Subject: LU-12236 tests: add tests for LNET network namespace
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fa5a642ec4e742533363e81ac62ac50b9e8f484d

Comment by Gerrit Updater [ 15/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34768/
Subject: LU-12236 lnet: support non-default network namespace
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 93b08edfb1c6ae8aec7e1009d3aca450416358d7

Comment by Gerrit Updater [ 22/Aug/19 ]

Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/35859
Subject: LU-12236 lnet: support non-default network namespace
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: c9be9e0900727c5fc7510da3a169dbdf0a5b67b9

Comment by Gerrit Updater [ 07/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35666/
Subject: LU-12236 tests: add tests for LNET network namespace
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b20704d5f63a07c54cfbea331df90e6ca765e79b

Comment by Peter Jones [ 07/Sep/19 ]

Everything landed now I think

Comment by Aurelien Degremont (Inactive) [ 09/Sep/19 ]

Everything that was in the pipe, yes. The main feature is landed, but i still got other patches to add namespace support for the ptlrpc gss part.

 

Comment by Peter Jones [ 09/Sep/19 ]

ok. My suggestion is to create a new ticket to track this follow on work. This will then make it easier for people to understand what functionality is in 2.13 vs 2.14

Comment by Aurelien Degremont (Inactive) [ 10/Sep/19 ]

Ok, I will create a new ticket for this part of this feature.

Comment by Gerrit Updater [ 15/Nov/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36769
Subject: LU-12236 lnet: support non-default network namespace
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 55de448b70b4456073ceaaa3e26f865f07c9f4c8

Comment by Gerrit Updater [ 15/Nov/19 ]

Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36770
Subject: LU-12236 tests: add tests for LNET network namespace
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ecccc31d4fc1f966a43f48712662080f39464c27

Comment by Gerrit Updater [ 12/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36769/
Subject: LU-12236 lnet: support non-default network namespace
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: bb4ef6bce3823668ca511915293f4991aa3cf75a

Comment by Gerrit Updater [ 12/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36770/
Subject: LU-12236 tests: add tests for LNET network namespace
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 2988fd1bbe42923158ebabbe9a89354d7e75d736

Comment by Gerrit Updater [ 04/May/20 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38476
Subject: LU-12236 gss: remove unused code in gss_svc_upcall.c
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 7a504a0439a1995b07655501d43d7c670a1d378b

Comment by Gerrit Updater [ 11/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38476/
Subject: LU-12236 gss: remove unused code in gss_svc_upcall.c
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 66d942b0a55feeb8bfb23179dac0c424d4cc089e

Generated at Sat Feb 10 02:50:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.