[LU-12126] The lustre test suite falls over when running with a modern LNet DLC configuration with o2iblnd Created: 28/Mar/19  Updated: 23/Aug/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.3
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Setting up any test suite to run test that use a post lustre 2.7 DLC configuration with ko2iblnd.


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Currently many of the lustre test suites like lustre-lfsck just fall over when attempting to run in my test bed. This is due to me running with a ko2iblnd DLC MultiRail configuration. The test suite assumes a tcp setup using modprobe.conf for ksocklnd only. Currently when running I see failures like:

rmmod: ERROR: Module ko2iblnd is in use

rmmod: ERROR: Module lnet is in use by: ko2iblnd

rmmod: ERROR: Module ko2iblnd is in use

rmmod: ERROR: Module libcfs is in use by: lnet ko2iblnd

modules unloaded.

Stopping clients: ninja82 /lustre/lustre (opts:-f)

Also lustre_rmmod will completely fail with an o2iblnd LNet DLC multi-rail configuration.

Simply attempting to execute lustre_rmmod with a system using lnetctl to set things up will show these problems.



 Comments   
Comment by Peter Jones [ 28/Mar/19 ]

Sonia

Could you please advise?

Peter

Comment by James Nunez (Inactive) [ 23/Aug/19 ]

Now that we are running some testing with IB, we are seeing this issue in autotest. For example, we see mmp fail with

rmmod: ERROR: Module ko2iblnd is in use

at https://testing.whamcloud.com/test_sets/0f269e38-be9f-11e9-97d5-52540065bddc

Even running testing manually, we see similar issues. I ran mmp and got the above error. Then tried to clean up the node and saw:

# ./llmountcleanup.sh 
Stopping clients: s143 /es15 (opts:-f)
Stopping clients: s143 /es152 (opts:-f)
pdsh@s143: sv131-ib0: ssh exited with exit code 1
pdsh@s143: sv134-ib0: ssh exited with exit code 1
pdsh@s143: sv135-ib0: ssh exited with exit code 1
rmmod: ERROR: Module ko2iblnd is in use
# mount | grep lustre
# lustre_rmmod
rmmod: ERROR: Module ko2iblnd is in use

Running 'lnetctl lnet unconfigure' allowed me to clean up the node.

Generated at Sat Feb 10 02:49:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.