[LU-10827] conf-sanity test 0 fails with ‘rmmod: ERROR: Module lustre is in use’ Created: 19/Mar/18  Updated: 29/Mar/18  Resolved: 29/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: ubuntu
Environment:

Ubuntu clients


Issue Links:
Related
is related to LU-6867 change test-framework to detect activ... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity tests 0, 1, 2, 3, 4, 5a/b/c/d and many others fail with the following error when trying to shut down the file system

stop mds service on onyx-50vm9
CMD: onyx-50vm9 grep -c /mnt/lustre-mds1' ' /proc/mounts || true
Stopping /mnt/lustre-mds1 (opts:-f) on onyx-50vm9
CMD: onyx-50vm9 umount -d -f /mnt/lustre-mds1
CMD: onyx-50vm9 lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
CMD: onyx-50vm6.onyx.hpdd.intel.com lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
rmmod: ERROR: Module lustre is in use
conf-sanity test_0: @@@@@@ FAIL: cleanup failed with 203

unmounting the OSTs and MDT seems to work, but calling rmmod on the client seems to fail; from the suite_log for https://testing.hpdd.intel.com/test_sets/5d846520-287c-11e8-9e0e-52540065bddc.

 

Looking at the client console (vm6) we see

Ubuntu 16.04.2 LTS trevis-4vm3.trevis.hpdd.intel.com ttyS0

trevis-4vm3 login: [    8.165539] audit: type=1400 audit(1521039871.976:11): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd" name="/etc/gss/mech.d/" pid=547 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[   81.009062] random: nonblocking pool is initialized
[  138.440162] libcfs: module verification failed: signature and/or required key missing - tainting kernel

We don’t see this during RHEL 7 testing.

 

On the client dmesg log, we see an error

[24874.129276] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
[24874.137005] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
[24880.900494] LustreError: 167-0: lustre-MDT0000-mdc-ffff880061827800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[24880.902374] LustreError: 8353:0:(file.c:4213:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
[24880.905171] Lustre: lustre-MDT0000-mdc-ffff880061827800: Connection restored to 10.2.9.244@tcp (at 10.2.9.244@tcp)
[24881.073111] Lustre: DEBUG MARKER: umount /mnt/lustre 2>&1
[24881.110161] Lustre: Unmounted lustre-client

 

So far, this issue is only seen when testing Ubuntu clients and started on 2018-02-27 22:03:52 UTC.

 

Logs for the failures are at

https://testing.hpdd.intel.com/test_sets/f75808be-1cb5-11e8-a7cd-52540065bddc

https://testing.hpdd.intel.com/test_sets/4aeef8ce-1de8-11e8-bd91-52540065bddc

https://testing.hpdd.intel.com/test_sets/cf7f2d1a-1f29-11e8-b046-52540065bddc

https://testing.hpdd.intel.com/test_sets/a268caba-2894-11e8-b3c6-52540065bddc



 Comments   
Comment by Peter Jones [ 20/Mar/18 ]

Yang Sheng

Could you please investigate?

Thanks

Peter

Comment by Yang Sheng [ 27/Mar/18 ]

From test script, the module unload should not be called since this is a mgsmds combined environment. Looks like this part should be ran on mds instead of client. Will push a patch to fix it.

 

Thanks,

Yangsheng

Comment by Gerrit Updater [ 27/Mar/18 ]

Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/31793
Subject: LU-10827 tests: unload_modules_conf should run on mds
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4cb7aabdf935339d30dfb89ac6755116aa20b944

Comment by James Nunez (Inactive) [ 28/Mar/18 ]

We reverted the patch for LU-6867 https://review.whamcloud.com/#/c/15638/ and it looks like conf-sanity running with Ubuntu clients now passes all testing. 

 

The revert patch is at https://review.whamcloud.com/#/c/31798/ and the conf-sanity results are at https://testing.hpdd.intel.com/test_sessions/19e2923f-bfde-498a-a827-583c610cd040. 

Comment by James Nunez (Inactive) [ 29/Mar/18 ]

When we reverted LU-6867, this issue has gone away. Closing as a duplicate of LU-6867.

Generated at Sat Feb 10 02:38:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.