Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9525

"lctl network down" won't work if network brought up by a mount

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The code for both "lctl network down" and "lnetctl net unconfigure" was written assuming the network was originally brought up using either "lctl network configure" or "lnetctl net configure".  If the network came up as part of a mount or "modprobe lustre", there is a flag which does not get set thereby preventing lctl or lnetctl from bringing down the network.

      This enforces a restriction that the mechanism which configures the network is the same one which unconfigures the network.  I see no reason for that restriction.  It is confusing and should not be there.

      This ticket has been opened to change that part of the code to let us switch to a different mechanism to unconfigure from the one used to configure.  Otherwise, debugging in the field is just no fun!

      Attachments

        Activity

          [LU-9525] "lctl network down" won't work if network brought up by a mount

          Per our conversation this morning, it looks like with LU-11986 fixed (which is in progress, patch exists) IML is able to do what it needs to do.

          pfarrell Patrick Farrell (Inactive) added a comment - Per our conversation this morning, it looks like with LU-11986 fixed (which is in progress, patch exists) IML is able to do what it needs to do.
          joe.grund Joe Grund added a comment -

          Sounds like this is expected behavior. I'll try using lustre_rmmod ptlrpc unconditionally before bringing down LNet.

          joe.grund Joe Grund added a comment - Sounds like this is expected behavior. I'll try using lustre_rmmod ptlrpc unconditionally before bringing down LNet.

          So I spent a bit on this before I realized, as far as I can tell, the code is fine, and I can't reproduce this:

          # modprobe lnet
          # lctl list_nids
          IOC_LIBCFS_GET_NI error 100: Network is down
          # lctl net up
          # lctl list_nids
          192.168.56.20@tcp
          # lctl net down
          LNET busy
          
          

          lctl net down works fine for me in this scenario, both on master and on b2_12.

          I also can't reproduce the problem Doug described about the config thing.  It's true that you cannot unconfigure LNet while ptlrpc is loaded, but that's because ptlrpc won't load without network interfaces available.  ptlrpc doesn't support being loaded without nis configured, so you can't unconfigure them without unloading it.

          But it's not just "a flag that's set" - that flag reflects whether LNet initialized its own config or if it was done by ptlrpc.  If it was done by ptlrpc, then LNet can't just tear it down.  All of the scenarios and orders I could think of for "config done by lnet, done by ptlrpc (this is the same as "done by a mount")" work fine, given that you accept that you can't unconfigure lnet while ptlrpc is loaded.

          Maybe there's just something IML is doing that's making LNet busy...?  Because this code seems to be fine.

          pfarrell Patrick Farrell (Inactive) added a comment - - edited So I spent a bit on this before I realized, as far as I can tell, the code is fine, and I can't reproduce this: # modprobe lnet # lctl list_nids IOC_LIBCFS_GET_NI error 100: Network is down # lctl net up # lctl list_nids 192.168.56.20@tcp # lctl net down LNET busy lctl net down works fine for me in this scenario, both on master and on b2_12. I also can't reproduce the problem Doug described about the config thing.  It's true that you cannot unconfigure LNet while ptlrpc is loaded, but that's because ptlrpc won't load without network interfaces available.  ptlrpc doesn't support being loaded without nis configured, so you can't unconfigure them without unloading it. But it's not just "a flag that's set" - that flag reflects whether LNet initialized its own config or if it was done by ptlrpc.  If it was done by ptlrpc, then LNet can't just tear it down.  All of the scenarios and orders I could think of for "config done by lnet, done by ptlrpc (this is the same as "done by a mount")" work fine, given that you accept that you can't unconfigure lnet while ptlrpc is loaded. Maybe there's just something IML is doing that's making LNet busy...?  Because this code seems to be fine.

          Doing lustre_rmmod tears everything down.  Doing a partial teardown results in a panic (probably because of polling by iml-agent) see LU-11986

          utopiabound Nathaniel Clark added a comment - Doing lustre_rmmod tears everything down.  Doing a partial teardown results in a panic (probably because of polling by iml-agent) see LU-11986

          Nathaniel,

          What happens if you do lustre_rmmod in there, like we discussed in Skype?  (Not saying you should have to, just curious what happens)

          pfarrell Patrick Farrell (Inactive) added a comment - Nathaniel, What happens if you do lustre_rmmod in there, like we discussed in Skype?  (Not saying you should have to, just curious what happens)

          With Lustre 2.12.0 it doesn't appear there's any way for lctl net down to work:

          # modprobe lnet
          # lctl list_nids
          IOC_LIBCFS_GET_NI error 100: Network is down
          # lctl net up
          # lctl list_nids
          192.168.56.20@tcp
          # lctl net down
          LNET busy
          

          Debug log with ALL the debugging:

          # lctl set_param debug=-1
          debug=-1
          # lctl dk > /dev/null ; lctl net down; lctl dk
          LNET busy
          00000400:00000001:1.0F:1550767099.121831:0:5433:0:(module.c:69:libcfs_ioctl()) Process entered
          00000400:00000001:1.0:1550767099.121833:0:5433:0:(linux-module.c:113:libcfs_ioctl_getdata()) Process entered
          00000400:00000010:1.0:1550767099.121834:0:5433:0:(linux-module.c:136:libcfs_ioctl_getdata()) alloc '(*hdr_pp)': 128 at ffffa0cd5829d900 (tot 502104).
          00000400:00000001:1.0:1550767099.121835:0:5433:0:(linux-module.c:143:libcfs_ioctl_getdata()) Process leaving (rc=0 : 0 : 0)
          00000400:00000001:1.0:1550767099.121836:0:5433:0:(linux-module.c:91:libcfs_ioctl_data_adjust()) Process entered
          00000400:00000001:1.0:1550767099.121837:0:5433:0:(linux-module.c:105:libcfs_ioctl_data_adjust()) Process leaving (rc=0 : 0 : 0)
          00000400:00000080:1.0:1550767099.121837:0:5433:0:(module.c:90:libcfs_ioctl()) libcfs ioctl cmd 3221775672
          00000400:00000010:1.0:1550767099.121839:0:5433:0:(module.c:118:libcfs_ioctl()) kfreed 'hdr': 128 at ffffa0cd5829d900 (tot 501976).
          00000400:00000001:1.0:1550767099.121840:0:5433:0:(module.c:119:libcfs_ioctl()) Process leaving (rc=18446744073709551600 : -16 : fffffffffffffff0)
          Debug log: 9 lines, 9 kept, 0 dropped, 0 bad.
          
          utopiabound Nathaniel Clark added a comment - With Lustre 2.12.0 it doesn't appear there's any way for lctl net down to work: # modprobe lnet # lctl list_nids IOC_LIBCFS_GET_NI error 100: Network is down # lctl net up # lctl list_nids 192.168.56.20@tcp # lctl net down LNET busy Debug log with ALL the debugging: # lctl set_param debug=-1 debug=-1 # lctl dk > /dev/null ; lctl net down; lctl dk LNET busy 00000400:00000001:1.0F:1550767099.121831:0:5433:0:(module.c:69:libcfs_ioctl()) Process entered 00000400:00000001:1.0:1550767099.121833:0:5433:0:(linux-module.c:113:libcfs_ioctl_getdata()) Process entered 00000400:00000010:1.0:1550767099.121834:0:5433:0:(linux-module.c:136:libcfs_ioctl_getdata()) alloc '(*hdr_pp)': 128 at ffffa0cd5829d900 (tot 502104). 00000400:00000001:1.0:1550767099.121835:0:5433:0:(linux-module.c:143:libcfs_ioctl_getdata()) Process leaving (rc=0 : 0 : 0) 00000400:00000001:1.0:1550767099.121836:0:5433:0:(linux-module.c:91:libcfs_ioctl_data_adjust()) Process entered 00000400:00000001:1.0:1550767099.121837:0:5433:0:(linux-module.c:105:libcfs_ioctl_data_adjust()) Process leaving (rc=0 : 0 : 0) 00000400:00000080:1.0:1550767099.121837:0:5433:0:(module.c:90:libcfs_ioctl()) libcfs ioctl cmd 3221775672 00000400:00000010:1.0:1550767099.121839:0:5433:0:(module.c:118:libcfs_ioctl()) kfreed 'hdr': 128 at ffffa0cd5829d900 (tot 501976). 00000400:00000001:1.0:1550767099.121840:0:5433:0:(module.c:119:libcfs_ioctl()) Process leaving (rc=18446744073709551600 : -16 : fffffffffffffff0) Debug log: 9 lines, 9 kept, 0 dropped, 0 bad.

          People

            pfarrell Patrick Farrell (Inactive)
            doug Doug Oucharek (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: