[LU-9439] Introduce an lnet systemd service Created: 02/May/17 Updated: 07/Nov/18 Resolved: 03/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0, Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Giuseppe Di Natale (Inactive) | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
This is effectively porting the initd version of the lnet service to systemd. This ticket should:
|
| Comments |
| Comment by Gerrit Updater [ 02/May/17 ] |
|
Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/26925 |
| Comment by Giuseppe Di Natale (Inactive) [ 02/May/17 ] |
|
I'd also like to provide a sample lnet.conf yaml file for lnetctl as part of this ticket. Could someone point me to a suitable example file? |
| Comment by Peter Jones [ 03/May/17 ] |
|
Amir Do you have a suitable example to share? Peter |
| Comment by Amir Shehata (Inactive) [ 03/May/17 ] |
|
I can provide an example, but the YAML file format has changed in the latest master. We still support the older format, but since this is ticket is on master, below is a sample of the latest YAML config file. net:
- net type: o2ib1
local NI(s):
- nid: 172.16.1.4@o2ib1
status: up
interfaces:
0: ib0
statistics:
send_count: 7
recv_count: 7
drop_count: 0
tunables:
peer_timeout: 180
peer_credits: 128
peer_buffer_credits: 0
credits: 1024
lnd tunables:
peercredits_hiw: 64
map_on_demand: 32
concurrent_sends: 256
fmr_pool_size: 2048
fmr_flush_trigger: 512
fmr_cache: 1
tcp bonding: 0
dev cpt: 0
CPT: "[0,1]"
- nid: 172.16.2.4@o2ib1
status: up
interfaces:
0: ib1
statistics:
send_count: 0
recv_count: 0
drop_count: 0
tunables:
peer_timeout: 180
peer_credits: 128
peer_buffer_credits: 0
credits: 1024
lnd tunables:
peercredits_hiw: 64
map_on_demand: 32
concurrent_sends: 256
fmr_pool_size: 2048
fmr_flush_trigger: 512
fmr_cache: 1
tcp bonding: 0
dev cpt: 1
CPT: "[0,1]"
route:
- net: o2ib
gateway: 172.16.1.1@o2ib1
hop: -1
priority: 0
state: down
peer:
- primary nid: 192.168.1.2@o2ib
Multi-Rail: True
peer ni:
- nid: 192.168.1.2@o2ib
state: NA
max_ni_tx_credits: 0
available_tx_credits: 0
min_tx_credits: 0
tx_q_num_of_buf: 0
available_rtr_credits: 0
min_rtr_credits: 0
send_count: 0
recv_count: 0
drop_count: 0
refcount: 2
- nid: 192.168.2.2@o2ib
state: NA
max_ni_tx_credits: 0
available_tx_credits: 0
min_tx_credits: 0
tx_q_num_of_buf: 0
available_rtr_credits: 0
min_rtr_credits: 0
send_count: 0
recv_count: 0
drop_count: 0
refcount: 2
- primary nid: 172.16.1.1@o2ib1
Multi-Rail: True
peer ni:
- nid: 172.16.1.1@o2ib1
state: up
max_ni_tx_credits: 128
available_tx_credits: 128
min_tx_credits: 127
tx_q_num_of_buf: 0
available_rtr_credits: 128
min_rtr_credits: 128
send_count: 7
recv_count: 7
drop_count: 0
refcount: 4
- nid: 172.16.2.1@o2ib1
state: NA
max_ni_tx_credits: 128
available_tx_credits: 128
min_tx_credits: 127
tx_q_num_of_buf: 0
available_rtr_credits: 128
min_rtr_credits: 128
send_count: 0
recv_count: 0
drop_count: 0
refcount: 1
|
| Comment by Christopher Morrone [ 03/May/17 ] |
|
Is that really an input file, or was that output? For instance, "status" doesn't seem like something that would appear in input. |
| Comment by Amir Shehata (Inactive) [ 04/May/17 ] |
|
This is an output. But the way it's designed you can feed the output YAML config, into the input. The code will only look at relevant parameters. Here is a cleaned input file, removing the unnecessary parameters: net:
- net type: o2ib1
local NI(s):
- nid: 172.16.1.4@o2ib1
interfaces:
0: ib0
tunables:
peer_timeout: 180
peer_credits: 128
peer_buffer_credits: 0
credits: 1024
lnd tunables:
peercredits_hiw: 64
map_on_demand: 32
concurrent_sends: 256
fmr_pool_size: 2048
fmr_flush_trigger: 512
fmr_cache: 1
CPT: "[0,1]"
- nid: 172.16.2.4@o2ib1
interfaces:
0: ib1
tunables:
peer_timeout: 180
peer_credits: 128
peer_buffer_credits: 0
credits: 1024
lnd tunables:
peercredits_hiw: 64
map_on_demand: 32
concurrent_sends: 256
fmr_pool_size: 2048
fmr_flush_trigger: 512
fmr_cache: 1
CPT: "[0,1]"
route:
- net: o2ib
gateway: 172.16.1.1@o2ib1
hop: -1
priority: 0
peer:
- primary nid: 192.168.1.2@o2ib
Multi-Rail: True
peer ni:
- nid: 192.168.1.2@o2ib
- nid: 192.168.2.2@o2ib
- primary nid: 172.16.1.1@o2ib1
Multi-Rail: True
peer ni:
- nid: 172.16.1.1@o2ib1
- nid: 172.16.2.1@o2ib1
|
| Comment by Gerrit Updater [ 05/May/17 ] |
|
Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/26959 |
| Comment by Giuseppe Di Natale (Inactive) [ 05/May/17 ] |
|
Thank you for the sample lnet.conf file. I'm going to be generating a patch to provide a sample lnet.conf and I'm noticing that it's going to require changes to init.d/lnet. The lnet init.d script relies on the existence of lnet.conf to determine if lnetctl should be used. I really don't want to hold this ticket up for that change... Should I go ahead and break that change out into it's own ticket? |
| Comment by Gerrit Updater [ 05/May/17 ] |
|
Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/26971 |
| Comment by Andreas Dilger [ 09/May/17 ] |
|
One option to handle this difference in the presence on lnet.conf would be to skip it if egrep -c -v "^#|^$" /etc/lnet.conf returns zero lines of real input. Not perfect, but should handle the case of the example lnet.conf. |
| Comment by Gerrit Updater [ 12/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26959/ |
| Comment by Bob Glossman (Inactive) [ 15/May/17 ] |
|
Since landing in master of https://review.whamcloud.com/26959 a few days ago lustre_rmmod called with no arguments no longer works as expected. It's supposed to remove all lustre modules in that case. it doesn't. example: # lustre_rmmod ERROR: Module ksocklnd is in use It refuses to unload lnet modules due to ptlrpc still being loaded. The following shows what modules stay loaded after lustre_rmmod is run, and shows that after an explicit remove of ptlrpc lustre_rmmod then operates as expected: # lsmod | more Module Size Used by ksocklnd 187377 1 ptlrpc 2278586 0 obdclass 1785957 1 ptlrpc lnet 486107 3 ksocklnd,ptlrpc,obdclass libcfs 393722 4 ksocklnd,ptlrpc,obdclass,lnet sunrpc 261975 0 crc32c 12759 0 ppdev 17750 0 parport_pc 45587 0 . . # rmmod ptlrpc # lustre_rmmod # lsmod | more Module Size Used by sunrpc 261975 0 crc32c 12759 0 ppdev 17750 0 parport_pc 45587 0 . . . |
| Comment by Giuseppe Di Natale (Inactive) [ 16/May/17 ] |
|
Bob, can you point me to some logs or provide more details? I can't reproduce the lustre_rmmod issue locally. |
| Comment by Bob Glossman (Inactive) [ 16/May/17 ] |
|
reproduces 100% on sles11sp4 client. another example: sles11sp4gm:/home/bogl/lustre-release # mount -t lustre -o flock,user_xattr centos2:/lustre /mnt/lustre sles11sp4gm:/home/bogl/lustre-release # umount /mnt/lustre sles11sp4gm:/home/bogl/lustre-release # lustre_rmmod ERROR: Module ksocklnd is in use sles11sp4gm:/home/bogl/lustre-release # rmmod ptlrpc sles11sp4gm:/home/bogl/lustre-release # lustre_rmmod sles11sp4gm:/home/bogl/lustre-release # |
| Comment by Gerrit Updater [ 17/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26925/ |
| Comment by Bob Glossman (Inactive) [ 17/May/17 ] |
|
lustre_rmmod problem reproduces on el6 client too. another example: [root@centos69 x86_64]# mount -t lustre -o flock,user_xattr centos2:/lustre /mnt/lustre [root@centos69 x86_64]# umount /mnt/lustre [root@centos69 x86_64]# lustre_rmmod ERROR: Module ksocklnd is in use [root@centos69 x86_64]# rmmod ptlrpc [root@centos69 x86_64]# lustre_rmmod Can't make it happen on el7 or sles12sp2. |
| Comment by Cliff White (Inactive) [ 17/May/17 ] |
|
DDN-410 also appears to be related to this patch. |
| Comment by Giuseppe Di Natale (Inactive) [ 17/May/17 ] |
|
I may have already asked this, but is there a reason why we have a custom module removal script? Why aren't we just using `modprobe -r`? The tool along with the removal option exists in el6 and sles11. In the case of the init scripts, we call `modprobe -r ptlrpc`, then lctl/lnetctl, then `modprobe -r` the top module in the stack. |
| Comment by Christopher Morrone [ 17/May/17 ] |
|
Here is what the man page for modprobe -r says: -r, --remove This option causes modprobe to remove rather than insert a module. If the modules it depends on are also unused, modprobe will try to remove them too. Unlike insertion, more than one module can be specified on the command line (it does not make sense to specify module parameters when removing modules). Note that it says "If the modules it depends on are also unused". It does not say "If there are modules that depend upon it, but they are unused, it removes those first". That is an important distinction. If there are modules using ptlrpc, then ptlrpc will not walk up (meaning in the in the direction of things that depend on ptlrpc) the dependency tree searching out a point where it finds a module that can be removed. lustre_rmmod walks the tree of modules that depend upon the specified module, removing those first (if possible) so that will then become possible to remove the specified module. lustre_rmmod can potentially use modprobe -r at the various removal steps, but modprobe -r as described in the man page does not do what lustre_rmmod does. In addition, lustre_rmmod was supposed to be smart enough to know that it may need to issue a command to stop networking before the lnet module can be removed. It sounds to me (from comments from Bob and Cliff) that somewhere along the way lustre_rmmod was broken. |
| Comment by Giuseppe Di Natale (Inactive) [ 17/May/17 ] |
|
I still can't reproduce this on an el6 based machine. Haven't tried sles11 yet. Also, I need more info. Are you doing these tests with ldiskfs? Are you bringing lnet up before any of this? Can you also change the unload_dep_modules_inclusive function in lustre_rmmod to be the following: # Unload all modules dependent on $1 (include removal of $1)
unload_dep_modules_inclusive() {
local MODULE=$1
# if $MODULE not loaded, return 0
lsmod | egrep -q "^\<$MODULE\>" || return 0
unload_dep_modules_exclusive $MODULE || return 1
echo "Removing $MODULE"
rmmod $MODULE || return 1
return 0
}
That will give me a good idea on what order the modules are being removed in so we can eliminate a potential ordering problem. |
| Comment by Amir Shehata (Inactive) [ 18/May/17 ] |
|
the issue here is that ptlrpc is not being removed. ptlrpc takes a reference on lnet. If it doesn't release that reference, lnet can not be unloaded. lsmod | grep lnet the current lustre_rmmod after the patch in this ticket grabs the list of modules which depend on lnet and tries to remove them first. However, it goes through them in the order listed above. So it tries to remove ko2iblnd first, but it can't, because networks are still loaded. When ptlrpc is removed first then it calls LNetNIFini() which decrements the reference counter. This brings the reference counter on LNet to 0. This triggers the cleanup code to cleanup the networks, routes, etc. allowing ko2iblnd to be unloaded and lustre_rmmod to succeed. The previous incarnation of lustre_rmmod took that into account and explicitly removed ptlrpc. Simply removing the network issuing "lnetctl lnet unconfigure" is not going to work either, because of the reference count taken by ptlrpc. In this case what you'd need to do is: I don't think this is a reasonable process to expect people to go through to unload lustre. That's why lustre_rmmod was created (I believe, although that predates me) What lustre_rmmod ought to do, is to know that ptlrpc needs to be unloaded to allow lnet and the lnds to be unloaded. |
| Comment by Amir Shehata (Inactive) [ 18/May/17 ] |
|
Giuseppe, 9 unload_dep_modules_exclusive() {
10 »·······local MODULE=$1
11 »·······local DEPS="$(lsmod | awk '($1 == "'$MODULE'") { print $4 }')"
12 »·······for SUBMOD in $(echo $DEPS | tr ',' ' '); do
13 »·······»·······unload_dep_modules_inclusive $SUBMOD || return 1
14 »·······done
15 »·······return 0
16 }
This just grabs the output from lsmod, as I indicated above: lsmod | grep lnet 483919 3 ko2iblnd,obdclass,ptlrpc Is that different in el6 or sles11? more detail on the order of removal removing module: libcfs fid,fld,lmv,mdc,lov,lnet,ko2iblnd,lustre,obdclass,ptlrpc removing module: lnet ko2iblnd,obdclass,ptlrpc |
| Comment by Bob Glossman (Inactive) [ 18/May/17 ] |
I said it reproduces on a client. There is no ldiskfs modules loaded, those are only on servers. I am not bringing up or installing any modules before the 'mount' command shown in the examples. All the client lustre modules involved are loading only by the mount. No modules are preloaded. There is no manual load or startup of LNET. No script based startup either, in init.d scripts for example.
As I already said I'm pretty sure lustre_rmmod was broken by the recent landing of https://review.whamcloud.com/26959, " |
| Comment by Gerrit Updater [ 18/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/27181 |
| Comment by Gerrit Updater [ 18/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27181/ |
| Comment by Andreas Dilger [ 18/May/17 ] |
|
I suspect that all that was needed here was to include ptlrpc into the list of modules being unloaded if no argument was given. That would ensure it is unloaded before LNet stop, and the rest of the unload could continue. if [[ -z "$modules" || "$modules" == "ldiskfs" ]; then modules="ptlrpc ldiskfs libcfs" fi Bob, Cliff, |
| Comment by Giuseppe Di Natale (Inactive) [ 18/May/17 ] |
|
Ok, after seeing the further details above, I agree with Andreas. It looks like a simple ordering problem that appears to be causing the problem. I agree with adding ptlrpc to the front of the modules list. Looking at the output of lsmod from my el6 machine, it appears ptlrpc always appears first in dependency listings. Please let me know if that fixes the issue and I'll fix up the patch. |
| Comment by Bob Glossman (Inactive) [ 18/May/17 ] |
|
I tried out Andreas' suggestion on el6 and sles11. It does fix the problem there. Haven't exhaustively tested elsewhere to make sure it doesn't break anything else. |
| Comment by Giuseppe Di Natale (Inactive) [ 18/May/17 ] |
|
Quick question, should I be submitting fixes for the reverted patches as new patches to gerrit? |
| Comment by Peter Jones [ 18/May/17 ] |
|
Yes I think so |
| Comment by Amir Shehata (Inactive) [ 19/May/17 ] |
|
Please note the same issue exists with lnet_selftest module. lnet_selftest depends on lnet, but the same ordering issue impacts its removal. It's not as critical as ptlrpc, but would be nice to get lustre_rmmod to handle it as well. |
| Comment by Giuseppe Di Natale (Inactive) [ 19/May/17 ] |
|
I'll add "lnet_selftest" to the list of modules. |
| Comment by Gerrit Updater [ 19/May/17 ] |
|
Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/27213 |
| Comment by Gerrit Updater [ 19/May/17 ] |
|
Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/27214 |
| Comment by Gerrit Updater [ 03/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27213/ |
| Comment by Gerrit Updater [ 03/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26971/ |
| Comment by Peter Jones [ 03/Jun/17 ] |
|
Landed for 2.10 |
| Comment by Nathan Crawford [ 09/Jun/17 ] |
|
I believe the redirect on line 16 of the systemd lnet.service.in file causes failure on startup. Switching "lnetctl import < /etc/lnet.conf" to "lnetctl import /etc/lnet.conf" on the installed lnet.service file seems to work fine. If redirection is necessary for systemd service files, I've seen people do things like: -Nate |
| Comment by Amir Shehata (Inactive) [ 09/Jun/17 ] |
|
lnetctl handles both redirection and just providing it a file name directly. So that change proposed should work. |
| Comment by Giuseppe Di Natale (Inactive) [ 12/Jun/17 ] |
|
I'll go ahead and submit a patch to correct that today. |
| Comment by Giuseppe Di Natale (Inactive) [ 12/Jun/17 ] |
|
|
| Comment by Gerrit Updater [ 19/Jul/17 ] |
|
Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: https://review.whamcloud.com/28106 |
| Comment by Gerrit Updater [ 29/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28106/ |