[LU-7736] lustre_rmmod does not remove all the Lustre modules Created: 03/Feb/16 Updated: 09/May/17 Resolved: 09/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Gregoire Pichon | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
Lustre 2.7.66 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
lustre_rmmod does not remove all the Lustre modules. A second call to the command does. [root@rio10 ~]# modprobe lustre [root@rio10 ~]# lctl list_nids 10.1.0.64@o2ib [root@rio10 ~]# lustre_rmmod Modules still loaded: lnet/lnet/lnet.o libcfs/libcfs/libcfs.o [root@rio10 ~]# lustre_rmmod After analysing the problem, it appears that:
By chance, in previous lustre versions (2.1, 2.4 or 2.5) the dependency order made the lustre_rmmod unload ptlrpc before ko2iblnd. Unfortunately, since lustre version 2.7, ko2iblnd is still in use when trying to unload, which then prevents lnet to unload. Then lnet module unload is not attempted again, leading to lnet and libcfs still loaded at the end. [root@rio10 ~]# modprobe lustre [root@rio10 ~]# lustre_rmmod DEBUG: rmmod lustre DEBUG: rmmod mdc DEBUG: rmmod fid DEBUG: rmmod lmv DEBUG: rmmod fld DEBUG: rmmod lmv rmmod: ERROR: Module lmv is not currently loaded DEBUG: rmmod mdc rmmod: ERROR: Module mdc is not currently loaded DEBUG: rmmod lov DEBUG: rmmod ko2iblnd rmmod: ERROR: Module ko2iblnd is in use DEBUG: rmmod ptlrpc DEBUG: rmmod obdclass DEBUG: rmmod ptlrpc rmmod: ERROR: Module ptlrpc is not currently loaded DEBUG: rmmod lnet rmmod: ERROR: Module lnet is in use by: ko2iblnd DEBUG: rmmod ko2iblnd DEBUG: rmmod lustre rmmod: ERROR: Module lustre is not currently loaded DEBUG: rmmod obdclass rmmod: ERROR: Module obdclass is not currently loaded DEBUG: rmmod ptlrpc rmmod: ERROR: Module ptlrpc is not currently loaded DEBUG: rmmod libcfs rmmod: ERROR: Module libcfs is in use by: lnet Modules still loaded: lnet/lnet/lnet.o libcfs/libcfs/libcfs.o |
| Comments |
| Comment by Gerrit Updater [ 03/Feb/16 ] |
|
Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/18279 |
| Comment by Peter Jones [ 03/Feb/16 ] |
|
Bob Could you please look after this patch? Thanks Peter |
| Comment by James Nunez (Inactive) [ 03/Mar/16 ] |
|
We might be seeing this in out autotest results. For the POSIX test results at https://testing.hpdd.intel.com/test_sets/03f82950-e14e-11e5-8edf-5254006e85c2, It looks like not all modules are removed. The last thing in the suite_stdout is ... 04:52:08:Stopping /mnt/ost8 (opts:-f) on onyx-34vm8 04:52:08:CMD: onyx-34vm8 umount -d -f /mnt/ost8 04:52:19:CMD: onyx-34vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 04:52:19:CMD: onyx-34vm1.onyx.hpdd.intel.com lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 04:52:19: 15 UP osc lustre-OST0000_osc lustre-OST0000_osc_UUID 4 04:52:19: 17 UP osc lustre-OST0001_osc lustre-OST0001_osc_UUID 4 04:52:19: 19 UP osc lustre-OST0002_osc lustre-OST0002_osc_UUID 4 04:52:19: 21 UP osc lustre-OST0003_osc lustre-OST0003_osc_UUID 4 04:52:19: 23 UP osc lustre-OST0004_osc lustre-OST0004_osc_UUID 4 04:52:19: 25 UP osc lustre-OST0005_osc lustre-OST0005_osc_UUID 4 04:52:19: 27 UP osc lustre-OST0006_osc lustre-OST0006_osc_UUID 4 04:52:19: 29 UP osc lustre-OST0007_osc lustre-OST0007_osc_UUID 4 04:52:19:Modules still loaded: 04:52:19:lustre/osc/osc.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o 05:51:43:********** Timeout by autotest system ********** |
| Comment by Bob Glossman (Inactive) [ 10/Mar/16 ] |
|
I haven't been able to reproduce the problem without any IB on hand, don't see it with only ksocklnd loaded. Don't know for sure the patch fixes the whole problem, but I've given it +review anyway. |
| Comment by Gerrit Updater [ 14/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18279/ |
| Comment by John Salinas (Inactive) [ 06/May/17 ] |
|
I appear to be seeing this issue: Load lnet # pdsh -w node0[1-8] "modprobe lnet" # pdsh -w node0[1-8] "/usr/sbin/lustre_rmmod" node08: Modules still loaded: node08: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node08: ssh exited with exit code 1 node01: Modules still loaded: node01: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node01: ssh exited with exit code 1 node05: Modules still loaded: node05: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node05: ssh exited with exit code 1 node07: Modules still loaded: node07: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node07: ssh exited with exit code 1 node03: Modules still loaded: node03: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node03: ssh exited with exit code 1 node02: Modules still loaded: node02: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node02: ssh exited with exit code 1 node04: Modules still loaded: node04: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node04: ssh exited with exit code 1 node06: Modules still loaded: node06: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o pdsh@natasha: node06: ssh exited with exit code 1 Try to unload, but we cannot remove either lnet or ksocklnd [root@natasha jsalinas]# pdsh -w node0[1-8] "lsmod |grep lnet " node08: lnet 444969 2 ksocklnd node08: libcfs 405310 2 lnet,ksocklnd node07: lnet 444969 2 ksocklnd node04: lnet 444969 2 ksocklnd node07: libcfs 405310 2 lnet,ksocklnd node04: libcfs 405310 2 lnet,ksocklnd node03: lnet 444969 2 ksocklnd node03: libcfs 405310 2 lnet,ksocklnd node02: lnet 444969 2 ksocklnd node02: libcfs 405310 2 lnet,ksocklnd node05: lnet 444969 2 ksocklnd node05: libcfs 405310 2 lnet,ksocklnd node06: lnet 444969 2 ksocklnd node06: libcfs 405310 2 lnet,ksocklnd node01: lnet 444969 2 ksocklnd node01: libcfs 405310 2 lnet,ksocklnd [root@natasha jsalinas]# pdsh -w node0[1-8] "lsmod |grep ksocklnd" node03: ksocklnd 179299 1 node03: lnet 444969 2 ksocklnd node03: libcfs 405310 2 lnet,ksocklnd node01: ksocklnd 179299 1 node01: lnet 444969 2 ksocklnd node01: libcfs 405310 2 lnet,ksocklnd node07: ksocklnd 179299 1 node07: lnet 444969 2 ksocklnd node07: libcfs 405310 2 lnet,ksocklnd node05: ksocklnd 179299 1 node05: lnet 444969 2 ksocklnd node05: libcfs 405310 2 lnet,ksocklnd node08: ksocklnd 179299 1 node08: lnet 444969 2 ksocklnd node08: libcfs 405310 2 lnet,ksocklnd node02: ksocklnd 179299 1 node02: lnet 444969 2 ksocklnd node02: libcfs 405310 2 lnet,ksocklnd node04: ksocklnd 179299 1 node04: lnet 444969 2 ksocklnd node04: libcfs 405310 2 lnet,ksocklnd node06: ksocklnd 179299 1 node06: lnet 444969 2 ksocklnd node06: libcfs 405310 2 lnet,ksocklnd Can't win this battle: [root@node01 ~]# rpm -qa |grep lustre |
| Comment by John Salinas (Inactive) [ 06/May/17 ] |
|
This appears to happen in the step between modprobe lnet and lctl network up |
| Comment by Andreas Dilger [ 08/May/17 ] |
|
There is work under |
| Comment by John Salinas (Inactive) [ 09/May/17 ] |
|
Perhaps I missed it but I didn't see any specific mention of ko2iblnd in |
| Comment by Peter Jones [ 09/May/17 ] |
|
Can you please open a new ticket to track the similar issue that you are seeing for the 2.10 release? |
| Comment by John Salinas (Inactive) [ 09/May/17 ] |
|
Will do |