Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11840

Multi rail dynamic discovery prevent mounting filesystem when some NIC is unreachable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      In recent Lustre releases, some specific filesystem could not be mounted due to a communication error between clients and servers, depending on the LNET configuration.

      If we have a filesystem running on a host with 2 interfaces, let say tcp0 and tcp1 and the devices are setup to reply on both interfaces (formatted with --servicenode IP1@tcp0,IP2@tcp1).

      If a client is connected only to tcp0 and try to mount this filesystem, it fails with an I/O error because it is trying to connect using tcp1 interface.

      Mount failed:

       

      # mount -t lustre x.y.z.a@tcp:/lustre /mnt/lustre
      mount.lustre: mount x.y.z.a@tcp:/lustre at /mnt/client failed: Input/output error
      Is the MGS running?
      

      dmesg shows that communication fails using the wrong IP

      [422880.743179] LNetError: 19787:0:(lib-move.c:1714:lnet_select_pathway()) no route to a.b.c.d@tcp1
      # lnetctl peer show
      peer:
       - primary nid: a.b.c.d@tcp1
       Multi-Rail: False
       peer ni:
       - nid: x.y.z.a@tcp
       state: NA
       - nid: 0@<0:0>
       state:

      Ping is OK though:

      # lctl ping x.y.z.a@tcp
      12345-0@lo
      12345-a.b.c.d@tcp1
      12345-x.y.z.a@tcp

       

      This was tested with 2.10.5 and 2.12 as server versions and 2.10, 2.11 and 2.12 as client.

      Only 2.10 client is able to mount the filesystem properly with this configuration

       

      I git-bisected the regression down to 0f1aaad LU-9480 lnet: implement Peer Discovery

      Looking at debug log, the client:

      • setups the peer with the proper NI
      • the pings the peer
      • updates the local peer info with the wrong NI coming from the ping reply

      data in the reply seems to announce the tcp1 IP as the primary nid.

      The client will then use this NI to contact the server even if it has no direct connection to it (tcp1) and has a correct one for the same peer (tcp0).

      Attachments

        Issue Links

          Activity

            [LU-11840] Multi rail dynamic discovery prevent mounting filesystem when some NIC is unreachable

            For the records, disabling LNET discovery seems to workaround the issue

            lnetctl set discovery 0

            before mounting the Lustre client.

            degremoa Aurelien Degremont (Inactive) added a comment - For the records, disabling LNET discovery seems to workaround the issue lnetctl set discovery 0 before mounting the Lustre client.

            Really helpful. Thank you!

            degremoa Aurelien Degremont (Inactive) added a comment - Really helpful. Thank you!
            ashehata Amir Shehata (Inactive) added a comment - - edited

            No. 0@lo will always get ignored because it's created implicitly. So you don't have to have it in the lnet.conf file.

            There is actually a patch

            LU-10452 lnet: cleanup YAML output

            which allows you to use a "–backup" option to print a YAML block with only the elements needed to reconfigure a system.

            lnetctl net show --backup 
            
            #also when you export that backup feature is automatically set
            
            lnetctl export > lnet.conf
            ashehata Amir Shehata (Inactive) added a comment - - edited No. 0@lo will always get ignored because it's created implicitly. So you don't have to have it in the lnet.conf file. There is actually a patch LU-10452 lnet: cleanup YAML output which allows you to use a "–backup" option to print a YAML block with only the elements needed to reconfigure a system. lnetctl net show --backup #also when you export that backup feature is automatically set lnetctl export > lnet.conf

            OK, understood.

            A simple question based on your config output. Should we declare 0@lo in a lnet.conf file? Used with lnetctl import.

            I could not find a clear statement on that looking at different places.

            degremoa Aurelien Degremont (Inactive) added a comment - OK, understood. A simple question based on your config output. Should we declare 0@lo in a lnet.conf file? Used with lnetctl import. I could not find a clear statement on that looking at different places.

            I don't think that it's a huge amount of work but I am focused on 2.13 feature work ATM so have not looked at it in much detail yet

            ashehata Amir Shehata (Inactive) added a comment - I don't think that it's a huge amount of work but I am focused on 2.13 feature work ATM so have not looked at it in much detail yet

            Thanks a lot! Do you have a rough idea if this is days or weeks of work?

            degremoa Aurelien Degremont (Inactive) added a comment - Thanks a lot! Do you have a rough idea if this is days or weeks of work?

            I'm working on a solution. Will update the ticket when I have a patch to test.

            ashehata Amir Shehata (Inactive) added a comment - I'm working on a solution. Will update the ticket when I have a patch to test.

            Yes. I believe that's the issue I pointed to here: https://jira.whamcloud.com/browse/LU-11840?focusedCommentId=240077&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-240077

            Lustre (not LNet) does its own NID lookup based on logs. The assumption inherit in the code is that there is only one NID per node. Which is not right.

            ashehata Amir Shehata (Inactive) added a comment - Yes. I believe that's the issue I pointed to here: https://jira.whamcloud.com/browse/LU-11840?focusedCommentId=240077&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-240077 Lustre (not LNet) does its own NID lookup based on logs. The assumption inherit in the code is that there is only one NID per node. Which is not right.

            @ashehata Did you make any progress on this topic?

            I'm facing a similar issue with a pure 2.10.5 configuration.

            Lustre servers have both tcp0 and tcp1 NID. MDT/OSTs are setup to use both of them. But Lustre servers will try to communicate only using the first configured interface. If it fails (timeout), they will never try the second one.

            Do you have any clue?

            degremoa Aurelien Degremont (Inactive) added a comment - - edited @ashehata Did you make any progress on this topic? I'm facing a similar issue with a pure 2.10.5 configuration. Lustre servers have both tcp0 and tcp1 NID. MDT/OSTs are setup to use both of them. But Lustre servers will try to communicate only using the first configured interface. If it fails (timeout), they will never try the second one. Do you have any clue?

            My LNET setup looks like the MDT one. There is 2 LND, tcp0 and tcp1 and only one interface for each of them.

            We did the test together on a simple system where both MDT and OST where on the same server, but I do not think this makes a difference here.

            Looking at my MGS client llog, it looks rather like the case #1.

            Devices were formatted specifying a simple service node option (see ticket description).

            degremoa Aurelien Degremont (Inactive) added a comment - My LNET setup looks like the MDT one. There is 2 LND, tcp0 and tcp1 and only one interface for each of them. We did the test together on a simple system where both MDT and OST where on the same server, but I do not think this makes a difference here. Looking at my MGS client llog, it looks rather like the case #1. Devices were formatted specifying a simple service node option (see ticket description).

            As discussed today, the work around where you configure the tcp NID to be primary on the server will work in your case.

            In the mean time I've been looking at a way to resolve the incompatibility between discovery enabled node and a non-discovery capable node (IE 2.10.x) and I have hit a snag.

            I'm testing two different scenarios

            1. OST(2.12) MDT(2.10.x) Client(2.12)
            2. OST(2.10.x) MDT(2.10.x) Client (2.12)

            Unfortunately, lustre does its own NID lookup without using LNet to pull the NID information in both scenarios, particularly, here:

             779 /**
             780  * Retrieve MDT nids from the client log, then start the lwp device.
             781  * there are only two scenarios which would include mdt nid.
             782  * 1.
             783  * marker   5 (flags=0x01, v2.1.54.0) lustre-MDTyyyy  'add mdc' xxx-
             784  * add_uuid  nid=192.168.122.162@tcp(0x20000c0a87aa2)  0:  1:192.168.122.162@tcp
             785  * attach    0:lustre-MDTyyyy-mdc  1:mdc  2:lustre-clilmv_UUID
             786  * setup     0:lustre-MDTyyyy-mdc  1:lustre-MDTyyyy_UUID  2:192.168.122.162@tcp
             787  * add_uuid  nid=192.168.172.1@tcp(0x20000c0a8ac01)  0:  1:192.168.172.1@tcp
             788  * add_conn  0:lustre-MDTyyyy-mdc  1:192.168.172.1@tcp
             789  * modify_mdc_tgts add 0:lustre-clilmv  1:lustre-MDTyyyy_UUID xxxx
             790  * marker   5 (flags=0x02, v2.1.54.0) lustre-MDTyyyy  'add mdc' xxxx-
             791  * 2.
             792  * marker   7 (flags=0x01, v2.1.54.0) lustre-MDTyyyy  'add failnid' xxxx-
             793  * add_uuid  nid=192.168.122.2@tcp(0x20000c0a87a02)  0:  1:192.168.122.2@tcp
             794  * add_conn  0:lustre-MDTyyyy-mdc  1:192.168.122.2@tcp
             795  * marker   7 (flags=0x02, v2.1.54.0) lustre-MDTyyyy  'add failnid' xxxx-
             796  **/
             797 static int client_lwp_config_process(const struct lu_env *env,
             798 »·······»·······»·······»·······     struct llog_handle *handle,
             799 »·······»·······»·······»·······     struct llog_rec_hdr *rec, void *data) 
            

            Lustre tries to retrieve the MDT nids from the client log and it looks at the first NID in the list. In both cases the OST is unable to mount the MGS, because it's using the tcp1 NID to get the peer and ends in this error:

            (events.c:543:ptlrpc_uuid_to_peer()) 192.168.122.117@tcp1->12345-<?>
            (client.c:97:ptlrpc_uuid_to_connection()) cannot find peer 192.168.122.117@tcp1!
            

            This error is independent from the backwards compatibility issue. My config looks like:

            OST:
            ----
            net:
                - net type: lo
                  local NI(s):
                    - nid: 0@lo
                      status: up
                - net type: tcp
                  local NI(s):
                    - nid: 192.168.122.114@tcp
                      status: up
                      interfaces:
                          0: eth0
                    - nid: 192.168.122.115@tcp
                      status: up
                      interfaces:
                          0: eth1
            
            MDT:
            ----
            net:
                - net type: lo
                  local NI(s):
                    - nid: 0@lo
                      status: up
                - net type: tcp1
                  local NI(s):
                    - nid: 192.168.122.117@tcp1
                      status: up
                      interfaces:
                          0: eth0
                - net type: tcp
                  local NI(s):
                    - nid: 192.168.122.118@tcp
                      status: up
                      interfaces:
                          0: eth1
            

            I'm curious how  you setup your OSTs so you don't run into the problem above?

            ashehata Amir Shehata (Inactive) added a comment - As discussed today, the work around where you configure the tcp NID to be primary on the server will work in your case. In the mean time I've been looking at a way to resolve the incompatibility between discovery enabled node and a non-discovery capable node (IE 2.10.x) and I have hit a snag. I'm testing two different scenarios OST(2.12) MDT(2.10.x) Client(2.12) OST(2.10.x) MDT(2.10.x) Client (2.12) Unfortunately, lustre does its own NID lookup without using LNet to pull the NID information in both scenarios, particularly, here: 779 /** 780 * Retrieve MDT nids from the client log, then start the lwp device. 781 * there are only two scenarios which would include mdt nid. 782 * 1. 783 * marker 5 (flags=0x01, v2.1.54.0) lustre-MDTyyyy 'add mdc' xxx- 784 * add_uuid nid=192.168.122.162@tcp(0x20000c0a87aa2) 0: 1:192.168.122.162@tcp 785 * attach 0:lustre-MDTyyyy-mdc 1:mdc 2:lustre-clilmv_UUID 786 * setup 0:lustre-MDTyyyy-mdc 1:lustre-MDTyyyy_UUID 2:192.168.122.162@tcp 787 * add_uuid nid=192.168.172.1@tcp(0x20000c0a8ac01) 0: 1:192.168.172.1@tcp 788 * add_conn 0:lustre-MDTyyyy-mdc 1:192.168.172.1@tcp 789 * modify_mdc_tgts add 0:lustre-clilmv 1:lustre-MDTyyyy_UUID xxxx 790 * marker 5 (flags=0x02, v2.1.54.0) lustre-MDTyyyy 'add mdc' xxxx- 791 * 2. 792 * marker 7 (flags=0x01, v2.1.54.0) lustre-MDTyyyy 'add failnid' xxxx- 793 * add_uuid nid=192.168.122.2@tcp(0x20000c0a87a02) 0: 1:192.168.122.2@tcp 794 * add_conn 0:lustre-MDTyyyy-mdc 1:192.168.122.2@tcp 795 * marker 7 (flags=0x02, v2.1.54.0) lustre-MDTyyyy 'add failnid' xxxx- 796 **/ 797 static int client_lwp_config_process( const struct lu_env *env, 798 »·······»·······»·······»······· struct llog_handle *handle, 799 »·······»·······»·······»······· struct llog_rec_hdr *rec, void *data) Lustre tries to retrieve the MDT nids from the client log and it looks at the first NID in the list. In both cases the OST is unable to mount the MGS, because it's using the tcp1 NID to get the peer and ends in this error: (events.c:543:ptlrpc_uuid_to_peer()) 192.168.122.117@tcp1->12345-<?> (client.c:97:ptlrpc_uuid_to_connection()) cannot find peer 192.168.122.117@tcp1! This error is independent from the backwards compatibility issue. My config looks like: OST: ---- net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 192.168.122.114@tcp status: up interfaces: 0: eth0 - nid: 192.168.122.115@tcp status: up interfaces: 0: eth1 MDT: ---- net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp1 local NI(s): - nid: 192.168.122.117@tcp1 status: up interfaces: 0: eth0 - net type: tcp local NI(s): - nid: 192.168.122.118@tcp status: up interfaces: 0: eth1 I'm curious how  you setup your OSTs so you don't run into the problem above?

            People

              ashehata Amir Shehata (Inactive)
              degremoa Aurelien Degremont (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated: