Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9990

MDS fails to mount due to (client.c:96:ptlrpc_uuid_to_connection()) cannot find peer MGC10.37.248.196@o2ib1 _0!

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • Latest lustre 2.10.5X running on RHEL7.4 with default OFED. Using IB for LND.
    • 2
    • 9223372036854775807

    Description

      Recently I started to run into issues with the MDT failing to mount randomly. Now with the latest master the MDT fails to mount every single time. Looking at the debug log I noticed the following error on the MDT:

      (client.c:96:ptlrpc_uuid_to_connection()) cannot find peer MGC10.37.248.196@o2ib1_0!

      Attachments

        1. dump-mds.log
          20 kB
        2. dump-mgs.log
          1.70 MB

        Issue Links

          Activity

            [LU-9990] MDS fails to mount due to (client.c:96:ptlrpc_uuid_to_connection()) cannot find peer MGC10.37.248.196@o2ib1 _0!
            mdiep Minh Diep added a comment -

            ashehata said we don't need this for LTS

            mdiep Minh Diep added a comment - ashehata said we don't need this for LTS
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29333/
            Subject: LU-9990 lnet: add backwards compatibility for YAML config
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3187d551d538bd8203c7156daaa617620c6569ab

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29333/ Subject: LU-9990 lnet: add backwards compatibility for YAML config Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3187d551d538bd8203c7156daaa617620c6569ab
            ashehata Amir Shehata (Inactive) added a comment - - edited

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/29333
            Subject: LU-9990 lnet: add backwards compatibility for YAML config
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e72cdc1373dfb930eccbc5d9afba215d8368b331

            ashehata Amir Shehata (Inactive) added a comment - - edited Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/29333 Subject: LU-9990 lnet: add backwards compatibility for YAML config Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e72cdc1373dfb930eccbc5d9afba215d8368b331

            ok. I'll make a change to handle "numa" entry in YAML file so I can make master backwards compatible with 2.10. In the meantime, if there is an error configuring, you should assume that the configuration is not complete, and the node is not really usable.

            ashehata Amir Shehata (Inactive) added a comment - ok. I'll make a change to handle "numa" entry in YAML file so I can make master backwards compatible with 2.10. In the meantime, if there is an error configuring, you should assume that the configuration is not complete, and the node is not really usable.

            Yes I do see a "call back from 'num' not found error when I start up. Also I was using lnet.conf from my 2.10 setup. I'm running the lnetctl import from the command line not script.

            simmonsja James A Simmons added a comment - Yes I do see a "call back from 'num' not found error when I start up. Also I was using lnet.conf from my 2.10 setup. I'm running the lnetctl import from the command line not script.

            People

              ashehata Amir Shehata (Inactive)
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: