[LU-12236] Support more than the default root network namespace Created: 27/Apr/19 Updated: 05/Jun/20 Resolved: 07/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.5 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Aurelien Degremont (Inactive) | Assignee: | Aurelien Degremont (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
| Comments |
| Comment by Gerrit Updater [ 27/Apr/19 ] |
|
Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/34768 |
| Comment by Aurelien Degremont (Inactive) [ 27/Apr/19 ] |
|
This patch is a first draft for review and comments |
| Comment by Aurelien Degremont (Inactive) [ 02/May/19 ] |
|
One more patch that remove dead code, needed prior to implement GSS support for network namespace. |
| Comment by Sebastien Buisson [ 09/May/19 ] |
|
Thanks for the patches Aurélien. Could you please document here how you are running Lustre in a container? Are you able to share a script that illustrates a use case for which your patch is helpful (ie with the patch it works, without it fails)? I tried your patch https://review.whamcloud.com/34768 with a privileged Docker container, without sharing the host's network namespace (so the container has its own network namespace). host# docker run -ti --privileged --name test_1 lustre_cli_1:latest /bin/bash I used pipework to create a network interface specific to the container host# pipework br1 test_1 10.128.11.189/21 And then, after installing the lustre package from within the container I tried 'lnetctl net add' on that interface. container# rpm -ivh /tmp/lustre-2.12.52_97_g2342994-1.el7.x86_64.rpm container# lnetctl lnet configure ; lnetctl net add --net tcp1 --if eth1 Unfortunately, it failed with the following errors: [4426281.034106] LNetError: 23358:0:(lib-socket.c:129:lnet_ipif_query()) Can't get flags for interface eth1 [4426281.034599] LNetError: 23358:0:(socklnd.c:2850:ksocknal_startup()) Can't get interface eth1 info: -19 [4426282.035079] LNetError: 105-4: Error -100 starting up LNI tcp It is probably due to the fact that the interface is looked up in the host's net namespace, not the container's. Thanks, |
| Comment by Aurelien Degremont (Inactive) [ 09/May/19 ] |
Exactly! And this is the use case I'm addressing. You need to dedicate the interface to a network namespace and run Lustre setup inside that network namespace, like:
host# ip netns add blue
host# ip link set eth1 netns blue
host# ip netns exec blue bash
netns# ifconfig eth1 ....
netns# modprobe ksocklnd
netns# lnetctl lnet configure
netns# lnetctl net add --net tcp1 --if eth1
|
| Comment by Sebastien Buisson [ 10/May/19 ] |
|
pipework is a very popular and convenient way to manipulate network namespaces and containers. Here is a rough list of actions carried out by pipework when running the 'pipework br1 test_1 10.128.11.189/21' command to add an eth1 interface inside the container named test_1 (whose pid is 25578), with network address 10.128.11.189/21 (br1 being the bride interface created on the host): rm -f /var/run/netns/25578 ln -s /proc/25578/ns/net /var/run/netns/25578 ip link add name veth1pl25578 mtu 1500 type veth peer name veth1pg25578 mtu 1500 ip link set veth1pl25578 master br1 ip link set veth1pl25578 up ip link set veth1pg25578 netns 25578 ip netns exec 25578 ip link set veth1pg25578 name eth1 ip netns exec 25578 ip addr add 10.128.11.189/21 brd 10.128.15.255 dev eth1 ip netns exec 25578 ip link set eth1 up So this is somehow similar to what you are doing by hand. However, your patch fails to make it work, with the errors given above. I think the difference is that, in the case of pipework, it creates a new interface, specifically for the container in the container namespace. In your case, you are just moving to a different namespace an interface that exists on the host (making it exclusive to this namespace, and preventing it to be used in any other namespace and even on the host). So it might hide some aspects, in particular the fact that some actions carried out by lnetctl may not be done in the expected namespace. By the way, you are always in the host's mount namespace in your tests, because as far as I can see you are only manipulating the network namespace. This is very different when running in actual containers (whether it be Docker containers or just basic lxc containers), and usually you do not have access to Lustre kernel modules for instance, as /lib/modules is not exported to the container.
|
| Comment by Sebastien Buisson [ 14/May/19 ] |
|
For the sake of clarity, the test scenario described in the comment https://jira.whamcloud.com/browse/LU-12236?focusedCommentId=246900&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-246900 above fails both with and without patch at https://review.whamcloud.com/34768. So this is not a regression from this patch, but this use case should be addressed by the patch. |
| Comment by Peter Jones [ 16/May/19 ] |
|
Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/34794 |
| Comment by Aurelien Degremont (Inactive) [ 22/May/19 ] |
|
Sébastien, I tried something very similar to the command list you posted (create a pair of veth devs, put one in container with an ip, and use it in lnetctl) and it works for me. I was able to configure the virtual interface. I did not tried with pipework and docker itself as I do not have that handy. I will use this command list as baseline to try to make a test for this feature for Maloo. |
| Comment by Gerrit Updater [ 25/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34794/ |
| Comment by Peter Jones [ 25/May/19 ] |
|
Landed for 2.13 |
| Comment by Aurelien Degremont (Inactive) [ 27/May/19 ] |
|
This first patch was only a cleaning patch before the real ones. There are at least 2 other ones coming (first one being https://review.whamcloud.com/34768). |
| Comment by Sebastien Buisson [ 11/Jun/19 ] |
|
Good news, with patchset 5 of https://review.whamcloud.com/34768 , I am able to successfully run the scenario described in the comment https://jira.whamcloud.com/browse/LU-12236?focusedCommentId=246900&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-246900 . So using a network interface for LNet from within a Docker container works with this patch. |
| Comment by Aurelien Degremont (Inactive) [ 12/Jun/19 ] |
|
That's good news! I will address your comments in Gerrit.
|
| Comment by Gerrit Updater [ 01/Aug/19 ] |
|
Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/35666 |
| Comment by Gerrit Updater [ 15/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34768/ |
| Comment by Gerrit Updater [ 22/Aug/19 ] |
|
Aurelien Degremont (degremoa@amazon.com) uploaded a new patch: https://review.whamcloud.com/35859 |
| Comment by Gerrit Updater [ 07/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35666/ |
| Comment by Peter Jones [ 07/Sep/19 ] |
|
Everything landed now I think |
| Comment by Aurelien Degremont (Inactive) [ 09/Sep/19 ] |
|
Everything that was in the pipe, yes. The main feature is landed, but i still got other patches to add namespace support for the ptlrpc gss part.
|
| Comment by Peter Jones [ 09/Sep/19 ] |
|
ok. My suggestion is to create a new ticket to track this follow on work. This will then make it easier for people to understand what functionality is in 2.13 vs 2.14 |
| Comment by Aurelien Degremont (Inactive) [ 10/Sep/19 ] |
|
Ok, I will create a new ticket for this part of this feature. |
| Comment by Gerrit Updater [ 15/Nov/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36769 |
| Comment by Gerrit Updater [ 15/Nov/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/36770 |
| Comment by Gerrit Updater [ 12/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36769/ |
| Comment by Gerrit Updater [ 12/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36770/ |
| Comment by Gerrit Updater [ 04/May/20 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38476 |
| Comment by Gerrit Updater [ 11/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38476/ |