[LU-9737] lnetctl net show command hung after add net Created: 05/Jul/17 Updated: 14/Jul/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | sebg-crd-pm (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
lustre: 2.10.0-RC1 |
||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Hi, I am test multi-rail in 2.10.0-RC1 and add one netwrok with ib0,ib1 interfaces. [test steps]
4.lnetctl net add --net o2ib0 --if ib0,ib1 [kernel message] |
| Comments |
| Comment by Peter Jones [ 05/Jul/17 ] |
|
The Multi-Rail instructions are in the manual - http://doc.lustre.org/lustre_manual.xhtml#lnetmr |
| Comment by Peter Jones [ 05/Jul/17 ] |
|
Amir Could you please assist with any follow on questions relating to the instructions in the manual? Thanks Peter |
| Comment by Amir Shehata (Inactive) [ 05/Jul/17 ] |
|
I'm unable to reproduce your problem locally. Is this reproducible a 100% of the time? From the stack trace it appears that the ln_api_mutex is not being unlocked causing a deadlock. But I don't see a problem in the code. How did you get 2.10-RC1? did you build it yourself? or did you download the RPMs from somewhere? Is there other users trying to run "lctl" commands at the same time when you encounter this problem? Do you have lustre up? or are you loading lnet by itself? Can you also paste the output of the following command: lnetctl -h |
| Comment by sebg-crd-pm (Inactive) [ 06/Jul/17 ] |
|
Hi, #Is this reproducible a 100% of the time? Becasue I have installed lustre 2.9 in these nodes before install lustre 2.10-RC1. Another question, if "lnetctl net add --net o2ib0 --if ib0,ib1" works fine in mgs node, |
| Comment by sebg-crd-pm (Inactive) [ 06/Jul/17 ] |
|
Hi Amir Shehata, Another question, if "lnetctl net add --net o2ib0 --if ib0,ib1" works fine in mgs node, |
| Comment by Olaf Weber [ 10/Jul/17 ] |
|
Hi Amir, this looks like a duplicate of |
| Comment by Olaf Weber [ 10/Jul/17 ] |
|
Based on my analysis of the source code and the procedure that created the hang, it is almost certain that this is a duplicate of Please note that even if the fix for To ensure that lnetctl will be built, install the rpms for libyaml and libyaml-devel on the build machine. This is not a hard requirement at the moment, but it will be in the future, because you need an up-to-date lnetctl to enable and configure new functionality like LNet Multi-Rail. Remember that to use lnetctl the libyaml rpm also has to be installed on all nodes. |
| Comment by sebg-crd-pm (Inactive) [ 14/Jul/17 ] |
|
Thanks for your kind reminder. => Remember that to use lnetctl the libyaml rpm also has to be installed on all nodes. |