[LU-12126] The lustre test suite falls over when running with a modern LNet DLC configuration with o2iblnd Created: 28/Mar/19 Updated: 23/Aug/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | James A Simmons | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Setting up any test suite to run test that use a post lustre 2.7 DLC configuration with ko2iblnd. |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Currently many of the lustre test suites like lustre-lfsck just fall over when attempting to run in my test bed. This is due to me running with a ko2iblnd DLC MultiRail configuration. The test suite assumes a tcp setup using modprobe.conf for ksocklnd only. Currently when running I see failures like: rmmod: ERROR: Module ko2iblnd is in use rmmod: ERROR: Module lnet is in use by: ko2iblnd rmmod: ERROR: Module ko2iblnd is in use rmmod: ERROR: Module libcfs is in use by: lnet ko2iblnd modules unloaded. Stopping clients: ninja82 /lustre/lustre (opts:-f) Also lustre_rmmod will completely fail with an o2iblnd LNet DLC multi-rail configuration. Simply attempting to execute lustre_rmmod with a system using lnetctl to set things up will show these problems. |
| Comments |
| Comment by Peter Jones [ 28/Mar/19 ] |
|
Sonia Could you please advise? Peter |
| Comment by James Nunez (Inactive) [ 23/Aug/19 ] |
|
Now that we are running some testing with IB, we are seeing this issue in autotest. For example, we see mmp fail with rmmod: ERROR: Module ko2iblnd is in use at https://testing.whamcloud.com/test_sets/0f269e38-be9f-11e9-97d5-52540065bddc Even running testing manually, we see similar issues. I ran mmp and got the above error. Then tried to clean up the node and saw: # ./llmountcleanup.sh Stopping clients: s143 /es15 (opts:-f) Stopping clients: s143 /es152 (opts:-f) pdsh@s143: sv131-ib0: ssh exited with exit code 1 pdsh@s143: sv134-ib0: ssh exited with exit code 1 pdsh@s143: sv135-ib0: ssh exited with exit code 1 rmmod: ERROR: Module ko2iblnd is in use # mount | grep lustre # lustre_rmmod rmmod: ERROR: Module ko2iblnd is in use Running 'lnetctl lnet unconfigure' allowed me to clean up the node. |