Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2145

The llog_origin_handle_create operation failed with -2

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • Orion
    • 2852

    Description

      When running IOR's writing to our Orion test system, I am seeing many Lustre error messages of the form:

      Jan  4 16:24:52 zwicky142 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:52 zwicky146 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky150 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky125 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky123 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky151 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky115 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:54 zwicky147 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      

      These are all messages from the clients, with 10.1.1.211 being the address of our MDS.

      The clients are running:

      $ rpm -qa | grep lustre-modules
      lustre-modules-2.1.0-15chaos_2.6.32_220.1chaos.ch5.x86_64.x86_64
      

      And the servers:

      # zwicky1 /root > rpm -qa | grep lustre-orion-modules
      lustre-orion-modules-2.2.49.50-2chaos_2.6.32_220.2.1.1chaos.ch5.x86_64.x86_64
      

      Attachments

        Issue Links

          Activity

            [LU-2145] The llog_origin_handle_create operation failed with -2

            Chris, what are the current spamming messages. The llog code has undergone some significant restructuring already, and I can't find either llog_origin_handle_create() or "operation failed" in the current code.

            I'm happy to land a patch to quiet the spammy messages, but the unified target patch is 1500 lines, and at this point we are down to the last few blockers, and trying to limit the size/complexity of landed patches.

            adilger Andreas Dilger added a comment - Chris, what are the current spamming messages. The llog code has undergone some significant restructuring already, and I can't find either llog_origin_handle_create() or "operation failed" in the current code. I'm happy to land a patch to quiet the spammy messages, but the unified target patch is 1500 lines, and at this point we are down to the last few blockers, and trying to limit the size/complexity of landed patches.

            Something needs to happen before 2.4.0 to stop the console error message spam.

            morrone Christopher Morrone (Inactive) added a comment - Something needs to happen before 2.4.0 to stop the console error message spam.
            pjones Peter Jones added a comment -

            James

            Indeed. The senior engineers involved are entirely focused on the remaining issues blocking 2.4. This work is expected to be brought to completion for 2.5

            Peter

            pjones Peter Jones added a comment - James Indeed. The senior engineers involved are entirely focused on the remaining issues blocking 2.4. This work is expected to be brought to completion for 2.5 Peter

            This patch; http://review.whamcloud.com/4826; seems to have gone into limbo.

            simmonsja James A Simmons added a comment - This patch; http://review.whamcloud.com/4826 ; seems to have gone into limbo.

            this patch solves initial problem with spamming error messages. And it is related to UT at the same time.

            tappro Mikhail Pershin added a comment - this patch solves initial problem with spamming error messages. And it is related to UT at the same time.

            Please confirm that there is no defect outstanding and this is just the Unified Target patch being landed.

            jlevi Jodi Levi (Inactive) added a comment - Please confirm that there is no defect outstanding and this is just the Unified Target patch being landed.

            http://review.whamcloud.com/4826

            combined patch to fix initial problem. All service handling is kept in MGS for now, but unified request handler is used to handle MGS request.

            tappro Mikhail Pershin added a comment - http://review.whamcloud.com/4826 combined patch to fix initial problem. All service handling is kept in MGS for now, but unified request handler is used to handle MGS request.

            move mgs ptlrpc services into unified server http://review.whamcloud.com/#change,4715
            It doesn't solve original problem, just prepare things, next patch will do all MGS request handlers in unified way and fix this issue.

            tappro Mikhail Pershin added a comment - move mgs ptlrpc services into unified server http://review.whamcloud.com/#change,4715 It doesn't solve original problem, just prepare things, next patch will do all MGS request handlers in unified way and fix this issue.

            Chris, still in progress, adapting Brian patch to the master, trying to keep it small and don't touch MDS side as it is critical for DNE landing. I expect to push patch to the gerrit in few days.

            tappro Mikhail Pershin added a comment - Chris, still in progress, adapting Brian patch to the master, trying to keep it small and don't touch MDS side as it is critical for DNE landing. I expect to push patch to the gerrit in few days.

            How is it going, Mike?

            morrone Christopher Morrone (Inactive) added a comment - How is it going, Mike?

            Mike, is the original problem in this bug fixed, and you are currently working on unified target code cleanup? If that is the case, I'd like to reduce the severity of this bug from being a 2.4.0 blocker.

            adilger Andreas Dilger added a comment - Mike, is the original problem in this bug fixed, and you are currently working on unified target code cleanup? If that is the case, I'd like to reduce the severity of this bug from being a 2.4.0 blocker.

            People

              tappro Mikhail Pershin
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: