[LU-9439] Introduce an lnet systemd service Created: 02/May/17  Updated: 07/Nov/18  Resolved: 03/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0, Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Giuseppe Di Natale (Inactive) Assignee: Dmitry Eremin (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-7736 lustre_rmmod does not remove all the ... Resolved
Related
is related to LU-9655 Remove file redirection in lnet syste... Resolved
is related to LU-8384 convert startup scripts to systemd un... Resolved
is related to LU-6132 Unable to unload ib drivers with lust... Open
Rank (Obsolete): 9223372036854775807

 Description   

This is effectively porting the initd version of the lnet service to systemd. This ticket should:

  1. Create an lnet systemd unit file
  2. Correctly determine if systemd is on the target system and setup the rpm to install the unit file and enable the service


 Comments   
Comment by Gerrit Updater [ 02/May/17 ]

Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/26925
Subject: LU-9439 scripts: lnet systemd service
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 064f80b504d6eb83c9ddca376bc3eb8aa0845e95

Comment by Giuseppe Di Natale (Inactive) [ 02/May/17 ]

I'd also like to provide a sample lnet.conf yaml file for lnetctl as part of this ticket. Could someone point me to a suitable example file?

Comment by Peter Jones [ 03/May/17 ]

Amir

Do you have a suitable example to share?

Peter

Comment by Amir Shehata (Inactive) [ 03/May/17 ]

I can provide an example, but the YAML file format has changed in the latest master. We still support the older format, but since this is ticket is on master, below is a sample of the latest YAML config file.

net:
    - net type: o2ib1
      local NI(s):
        - nid: 172.16.1.4@o2ib1
          status: up
          interfaces:
              0: ib0
          statistics:
              send_count: 7
              recv_count: 7
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 64
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
          tcp bonding: 0
          dev cpt: 0
          CPT: "[0,1]"
        - nid: 172.16.2.4@o2ib1
          status: up
          interfaces:
              0: ib1
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 64
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
          tcp bonding: 0
          dev cpt: 1
          CPT: "[0,1]"
route:
    - net: o2ib
      gateway: 172.16.1.1@o2ib1
      hop: -1
      priority: 0
      state: down
peer:
    - primary nid: 192.168.1.2@o2ib
      Multi-Rail: True
      peer ni:
        - nid: 192.168.1.2@o2ib
          state: NA
          max_ni_tx_credits: 0
          available_tx_credits: 0
          min_tx_credits: 0
          tx_q_num_of_buf: 0
          available_rtr_credits: 0
          min_rtr_credits: 0
          send_count: 0
          recv_count: 0
          drop_count: 0
          refcount: 2
        - nid: 192.168.2.2@o2ib
          state: NA
          max_ni_tx_credits: 0
          available_tx_credits: 0
          min_tx_credits: 0
          tx_q_num_of_buf: 0
          available_rtr_credits: 0
          min_rtr_credits: 0
          send_count: 0
          recv_count: 0
          drop_count: 0
          refcount: 2
    - primary nid: 172.16.1.1@o2ib1
      Multi-Rail: True
      peer ni:
        - nid: 172.16.1.1@o2ib1
          state: up
          max_ni_tx_credits: 128
          available_tx_credits: 128
          min_tx_credits: 127
          tx_q_num_of_buf: 0
          available_rtr_credits: 128
          min_rtr_credits: 128
          send_count: 7
          recv_count: 7
          drop_count: 0
          refcount: 4
        - nid: 172.16.2.1@o2ib1
          state: NA
          max_ni_tx_credits: 128
          available_tx_credits: 128
          min_tx_credits: 127
          tx_q_num_of_buf: 0
          available_rtr_credits: 128
          min_rtr_credits: 128
          send_count: 0
          recv_count: 0
          drop_count: 0
          refcount: 1
Comment by Christopher Morrone [ 03/May/17 ]

Is that really an input file, or was that output? For instance, "status" doesn't seem like something that would appear in input.

Comment by Amir Shehata (Inactive) [ 04/May/17 ]

This is an output. But the way it's designed you can feed the output YAML config, into the input. The code will only look at relevant parameters. Here is a cleaned input file, removing the unnecessary parameters:

net:
    - net type: o2ib1
      local NI(s):
        - nid: 172.16.1.4@o2ib1
          interfaces:
              0: ib0
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 64
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
          CPT: "[0,1]"
        - nid: 172.16.2.4@o2ib1
          interfaces:
              0: ib1
          tunables:
              peer_timeout: 180
              peer_credits: 128
              peer_buffer_credits: 0
              credits: 1024
          lnd tunables:
              peercredits_hiw: 64
              map_on_demand: 32
              concurrent_sends: 256
              fmr_pool_size: 2048
              fmr_flush_trigger: 512
              fmr_cache: 1
          CPT: "[0,1]"
route:
    - net: o2ib
      gateway: 172.16.1.1@o2ib1
      hop: -1
      priority: 0
peer:
    - primary nid: 192.168.1.2@o2ib
      Multi-Rail: True
      peer ni:
        - nid: 192.168.1.2@o2ib
        - nid: 192.168.2.2@o2ib
    - primary nid: 172.16.1.1@o2ib1
      Multi-Rail: True
      peer ni:
        - nid: 172.16.1.1@o2ib1
        - nid: 172.16.2.1@o2ib1
Comment by Gerrit Updater [ 05/May/17 ]

Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/26959
Subject: LU-9439 scripts: Change behavior of lustre_rmmod
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 18e36250bd06605e20f3c42ae4802ee428c27f57

Comment by Giuseppe Di Natale (Inactive) [ 05/May/17 ]

Thank you for the sample lnet.conf file. I'm going to be generating a patch to provide a sample lnet.conf and I'm noticing that it's going to require changes to init.d/lnet. The lnet init.d script relies on the existence of lnet.conf to determine if lnetctl should be used. I really don't want to hold this ticket up for that change... Should I go ahead and break that change out into it's own ticket?

Comment by Gerrit Updater [ 05/May/17 ]

Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/26971
Subject: LU-9439 scripts: Provide a sample lnet.conf file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bb5954645a672b261a4e73edd76a29e79ce3542d

Comment by Andreas Dilger [ 09/May/17 ]

One option to handle this difference in the presence on lnet.conf would be to skip it if egrep -c -v "^#|^$" /etc/lnet.conf returns zero lines of real input. Not perfect, but should handle the case of the example lnet.conf.

Comment by Gerrit Updater [ 12/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26959/
Subject: LU-9439 scripts: Change behavior of lustre_rmmod
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 645153be3eb1fd8c634717507f73d85625d1b84a

Comment by Bob Glossman (Inactive) [ 15/May/17 ]

Since landing in master of https://review.whamcloud.com/26959 a few days ago lustre_rmmod called with no arguments no longer works as expected. It's supposed to remove all lustre modules in that case. it doesn't. example:

# lustre_rmmod
ERROR: Module ksocklnd is in use

It refuses to unload lnet modules due to ptlrpc still being loaded. The following shows what modules stay loaded after lustre_rmmod is run, and shows that after an explicit remove of ptlrpc lustre_rmmod then operates as expected:

# lsmod | more
Module                  Size  Used by
ksocklnd              187377  1 
ptlrpc               2278586  0 
obdclass             1785957  1 ptlrpc
lnet                  486107  3 ksocklnd,ptlrpc,obdclass
libcfs                393722  4 ksocklnd,ptlrpc,obdclass,lnet
sunrpc                261975  0 
crc32c                 12759  0 
ppdev                  17750  0 
parport_pc             45587  0 
  .  
  .
# rmmod ptlrpc
# lustre_rmmod
# lsmod | more
Module                  Size  Used by
sunrpc                261975  0 
crc32c                 12759  0 
ppdev                  17750  0 
parport_pc             45587  0 
  .
  .
  .
Comment by Giuseppe Di Natale (Inactive) [ 16/May/17 ]

Bob, can you point me to some logs or provide more details? I can't reproduce the lustre_rmmod issue locally.

Comment by Bob Glossman (Inactive) [ 16/May/17 ]

reproduces 100% on sles11sp4 client. another example:

sles11sp4gm:/home/bogl/lustre-release # mount -t lustre -o flock,user_xattr centos2:/lustre /mnt/lustre
sles11sp4gm:/home/bogl/lustre-release # umount /mnt/lustre
sles11sp4gm:/home/bogl/lustre-release # lustre_rmmod
ERROR: Module ksocklnd is in use
sles11sp4gm:/home/bogl/lustre-release # rmmod ptlrpc
sles11sp4gm:/home/bogl/lustre-release # lustre_rmmod
sles11sp4gm:/home/bogl/lustre-release # 
Comment by Gerrit Updater [ 17/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26925/
Subject: LU-9439 scripts: lnet systemd service
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 32d1a1c5d610d054ad4609c1cf332172e8310805

Comment by Bob Glossman (Inactive) [ 17/May/17 ]

lustre_rmmod problem reproduces on el6 client too. another example:

[root@centos69 x86_64]# mount -t lustre -o flock,user_xattr centos2:/lustre /mnt/lustre
[root@centos69 x86_64]# umount /mnt/lustre
[root@centos69 x86_64]# lustre_rmmod
ERROR: Module ksocklnd is in use
[root@centos69 x86_64]# rmmod ptlrpc
[root@centos69 x86_64]# lustre_rmmod

Can't make it happen on el7 or sles12sp2.

Comment by Cliff White (Inactive) [ 17/May/17 ]

DDN-410 also appears to be related to this patch.
We also see the issue of soak/sprit clusters.
We reproduce the issue in the same way as Bob above.
I do not like this change, I don't see how you can unload all modules without multiple invocations of lustre_rmmod. The original intent of lustre_rmmod was to have a script that scrubbed everything, always. Because it’s simple and stupid and works. Smart admins who only want to remove one module can use the lnetctl and rmmod commands without this script.

Comment by Giuseppe Di Natale (Inactive) [ 17/May/17 ]

I may have already asked this, but is there a reason why we have a custom module removal script? Why aren't we just using `modprobe -r`? The tool along with the removal option exists in el6 and sles11. In the case of the init scripts, we call `modprobe -r ptlrpc`, then lctl/lnetctl, then `modprobe -r` the top module in the stack.

Comment by Christopher Morrone [ 17/May/17 ]

Here is what the man page for modprobe -r says:

 -r, --remove
 This option causes modprobe to remove rather than insert a module. If the modules it
 depends on are also unused, modprobe will try to remove them too. Unlike insertion,
 more than one module can be specified on the command line (it does not make sense to
 specify module parameters when removing modules).

Note that it says "If the modules it depends on are also unused". It does not say "If there are modules that depend upon it, but they are unused, it removes those first". That is an important distinction. If there are modules using ptlrpc, then ptlrpc will not walk up (meaning in the in the direction of things that depend on ptlrpc) the dependency tree searching out a point where it finds a module that can be removed.

lustre_rmmod walks the tree of modules that depend upon the specified module, removing those first (if possible) so that will then become possible to remove the specified module. lustre_rmmod can potentially use modprobe -r at the various removal steps, but modprobe -r as described in the man page does not do what lustre_rmmod does.

In addition, lustre_rmmod was supposed to be smart enough to know that it may need to issue a command to stop networking before the lnet module can be removed. It sounds to me (from comments from Bob and Cliff) that somewhere along the way lustre_rmmod was broken.

Comment by Giuseppe Di Natale (Inactive) [ 17/May/17 ]

I still can't reproduce this on an el6 based machine. Haven't tried sles11 yet.

Also, I need more info. Are you doing these tests with ldiskfs? Are you bringing lnet up before any of this?

Can you also change the unload_dep_modules_inclusive function in lustre_rmmod to be the following:

# Unload all modules dependent on $1 (include removal of $1)
unload_dep_modules_inclusive() {
    local MODULE=$1

    # if $MODULE not loaded, return 0
    lsmod | egrep -q "^\<$MODULE\>" || return 0
    unload_dep_modules_exclusive $MODULE || return 1
    echo "Removing $MODULE"
    rmmod $MODULE || return 1
    return 0
}

That will give me a good idea on what order the modules are being removed in so we can eliminate a potential ordering problem.

Comment by Amir Shehata (Inactive) [ 18/May/17 ]

the issue here is that ptlrpc is not being removed. ptlrpc takes a reference on lnet. If it doesn't release that reference, lnet can not be unloaded.

lsmod | grep lnet
483919 3 ko2iblnd,obdclass,ptlrpc

the current lustre_rmmod after the patch in this ticket grabs the list of modules which depend on lnet and tries to remove them first. However, it goes through them in the order listed above. So it tries to remove ko2iblnd first, but it can't, because networks are still loaded. When ptlrpc is removed first then it calls LNetNIFini() which decrements the reference counter. This brings the reference counter on LNet to 0. This triggers the cleanup code to cleanup the networks, routes, etc. allowing ko2iblnd to be unloaded and lustre_rmmod to succeed.

The previous incarnation of lustre_rmmod took that into account and explicitly removed ptlrpc.

Simply removing the network issuing "lnetctl lnet unconfigure" is not going to work either, because of the reference count taken by ptlrpc.

In this case what you'd need to do is:
1. Bring down all networks manually using "lnetctl net del"
-> This step essentially removes dependency between lnet module and ko2iblnd (or other lnds)
2. lustre_rmmod
-> This will succeed because there is nothing hindering ko2iblnd from being unloaded, and then ptlrpc will be unloaded as well, releasing the final reference hold on lnet, allowing lnet to be unloaded.

I don't think this is a reasonable process to expect people to go through to unload lustre. That's why lustre_rmmod was created (I believe, although that predates me)

What lustre_rmmod ought to do, is to know that ptlrpc needs to be unloaded to allow lnet and the lnds to be unloaded.

Comment by Amir Shehata (Inactive) [ 18/May/17 ]

Giuseppe,
I think the function of interest is:

  9 unload_dep_modules_exclusive() {
 10 »·······local MODULE=$1
 11 »·······local DEPS="$(lsmod | awk '($1 == "'$MODULE'") { print $4 }')"
 12 »·······for SUBMOD in $(echo $DEPS | tr ',' ' '); do
 13 »·······»·······unload_dep_modules_inclusive $SUBMOD || return 1
 14 »·······done
 15 »·······return 0
 16 }

This just grabs the output from lsmod, as I indicated above:

lsmod | grep lnet 
483919 3 ko2iblnd,obdclass,ptlrpc

Is that different in el6 or sles11?

more detail on the order of removal

removing module:  libcfs
fid,fld,lmv,mdc,lov,lnet,ko2iblnd,lustre,obdclass,ptlrpc

removing module:  lnet
ko2iblnd,obdclass,ptlrpc
Comment by Bob Glossman (Inactive) [ 18/May/17 ]

I still can't reproduce this on an el6 based machine. Haven't tried sles11 yet.
Also, I need more info. Are you doing these tests with ldiskfs? Are you bringing lnet up before any of this?

I said it reproduces on a client. There is no ldiskfs modules loaded, those are only on servers.

I am not bringing up or installing any modules before the 'mount' command shown in the examples. All the client lustre modules involved are loading only by the mount. No modules are preloaded. There is no manual load or startup of LNET. No script based startup either, in init.d scripts for example.

It sounds to me (from comments from Bob and Cliff) that somewhere along the way lustre_rmmod was broken.

As I already said I'm pretty sure lustre_rmmod was broken by the recent landing of https://review.whamcloud.com/26959, "LU-9439 scripts: Change behavior of lustre_rmmod". Before that change it worked correctly.

Comment by Gerrit Updater [ 18/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: https://review.whamcloud.com/27181
Subject: Revert "LU-9439 scripts: Change behavior of lustre_rmmod"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5e47c0de060f71f809dab69adafa1c814b4ad253

Comment by Gerrit Updater [ 18/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27181/
Subject: Revert "LU-9439 scripts: Change behavior of lustre_rmmod"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0bc19643b94f0adf28db365a07bcefeff4ebc51d

Comment by Andreas Dilger [ 18/May/17 ]

I suspect that all that was needed here was to include ptlrpc into the list of modules being unloaded if no argument was given. That would ensure it is unloaded before LNet stop, and the rest of the unload could continue.

        if [[ -z "$modules" || "$modules" == "ldiskfs" ]; then
                 modules="ptlrpc ldiskfs libcfs"
        fi

Bob, Cliff,
Could you please give this a try on the in-reverted script to see if it solves your problem? If yes, please fix up and resubmit the patch for landing.

Comment by Giuseppe Di Natale (Inactive) [ 18/May/17 ]

Ok, after seeing the further details above, I agree with Andreas. It looks like a simple ordering problem that appears to be causing the problem. I agree with adding ptlrpc to the front of the modules list.

Looking at the output of lsmod from my el6 machine, it appears ptlrpc always appears first in dependency listings.

Please let me know if that fixes the issue and I'll fix up the patch.

Comment by Bob Glossman (Inactive) [ 18/May/17 ]

I tried out Andreas' suggestion on el6 and sles11. It does fix the problem there. Haven't exhaustively tested elsewhere to make sure it doesn't break anything else.

Comment by Giuseppe Di Natale (Inactive) [ 18/May/17 ]

Quick question, should I be submitting fixes for the reverted patches as new patches to gerrit?

Comment by Peter Jones [ 18/May/17 ]

Yes I think so

Comment by Amir Shehata (Inactive) [ 19/May/17 ]

Please note the same issue exists with lnet_selftest module. lnet_selftest depends on lnet, but the same ordering issue impacts its removal.

It's not as critical as ptlrpc, but would be nice to get lustre_rmmod to handle it as well.

Comment by Giuseppe Di Natale (Inactive) [ 19/May/17 ]

I'll add "lnet_selftest" to the list of modules.

Comment by Gerrit Updater [ 19/May/17 ]

Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/27213
Subject: LU-9439 scripts: Change behavior of lustre_rmmod
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d71489e1d8f6da11e52f965e8cd7b6f87a17252d

Comment by Gerrit Updater [ 19/May/17 ]

Giuseppe Di Natale (dinatale2@llnl.gov) uploaded a new patch: https://review.whamcloud.com/27214
Subject: LU-9439 scripts: lnet systemd service
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1656727e8ab2fa2b5d29d7f356f2c45131db6bae

Comment by Gerrit Updater [ 03/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27213/
Subject: LU-9439 scripts: Change behavior of lustre_rmmod
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c6e5f4069edaecb8461df2d03566bd5e333b8a5c

Comment by Gerrit Updater [ 03/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26971/
Subject: LU-9439 scripts: Provide a sample lnet.conf file
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 25ee73e7733214f7a46f81b2540b7fca67b0eef1

Comment by Peter Jones [ 03/Jun/17 ]

Landed for 2.10

Comment by Nathan Crawford [ 09/Jun/17 ]

I believe the redirect on line 16 of the systemd lnet.service.in file causes failure on startup. Switching "lnetctl import < /etc/lnet.conf" to "lnetctl import /etc/lnet.conf" on the installed lnet.service file seems to work fine.

If redirection is necessary for systemd service files, I've seen people do things like:
ExecStart=/bin/sh -c '/usr/sbin/lnetctl import < /etc/lnet.conf'

-Nate

Comment by Amir Shehata (Inactive) [ 09/Jun/17 ]

lnetctl handles both redirection and just providing it a file name directly. So that change proposed should work.

Comment by Giuseppe Di Natale (Inactive) [ 12/Jun/17 ]

I'll go ahead and submit a patch to correct that today.

Comment by Giuseppe Di Natale (Inactive) [ 12/Jun/17 ]

LU-9655 for the file redirection fix.

Comment by Gerrit Updater [ 19/Jul/17 ]

Dmitry Eremin (dmitry.eremin@intel.com) uploaded a new patch: https://review.whamcloud.com/28106
Subject: LU-9439 scripts: add lnet script in .gitignore
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 67305d237d75b608f49a7e264b6cef971e8c7494

Comment by Gerrit Updater [ 29/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28106/
Subject: LU-9439 scripts: add lnet script in .gitignore
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6ca43d539b2856d68d330023f04643f9e09a8cfa

Generated at Sat Feb 10 02:26:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.