Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2145

The llog_origin_handle_create operation failed with -2

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • Orion
    • 2852

    Description

      When running IOR's writing to our Orion test system, I am seeing many Lustre error messages of the form:

      Jan  4 16:24:52 zwicky142 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:52 zwicky146 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky150 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky125 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky123 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky151 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky115 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:54 zwicky147 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      

      These are all messages from the clients, with 10.1.1.211 being the address of our MDS.

      The clients are running:

      $ rpm -qa | grep lustre-modules
      lustre-modules-2.1.0-15chaos_2.6.32_220.1chaos.ch5.x86_64.x86_64
      

      And the servers:

      # zwicky1 /root > rpm -qa | grep lustre-orion-modules
      lustre-orion-modules-2.2.49.50-2chaos_2.6.32_220.2.1.1chaos.ch5.x86_64.x86_64
      

      Attachments

        Issue Links

          Activity

            [LU-2145] The llog_origin_handle_create operation failed with -2

            patch was landed

            tappro Mikhail Pershin added a comment - patch was landed

            Andreas, I think the error comes from ptlrpc_check_status(). On master the message has changed a bit so a that a simple grep for "operation failed" will miss it, and "llog_origin_handle_create" is a %s that is filled in by ll_opcode2str(opc).

            But as far as I know the situation still remains that will trigger ptlrpc_check_status() printing an error message. According to the most recent comment from Mike in this ticket, the original problem for which this ticket was opened is only fixed in the larger UT patch that has not yet landed.

            I too am fine with a patch that just quiets the console message, as long as nothing bad is really happening behind the scenes that will bite us down the road. Which is why I posted that something needs to happen. I wasn't particularly advocating for UT this close to the release date.

            morrone Christopher Morrone (Inactive) added a comment - Andreas, I think the error comes from ptlrpc_check_status(). On master the message has changed a bit so a that a simple grep for "operation failed" will miss it, and "llog_origin_handle_create" is a %s that is filled in by ll_opcode2str(opc). But as far as I know the situation still remains that will trigger ptlrpc_check_status() printing an error message. According to the most recent comment from Mike in this ticket, the original problem for which this ticket was opened is only fixed in the larger UT patch that has not yet landed. I too am fine with a patch that just quiets the console message, as long as nothing bad is really happening behind the scenes that will bite us down the road. Which is why I posted that something needs to happen. I wasn't particularly advocating for UT this close to the release date.

            Chris, what are the current spamming messages. The llog code has undergone some significant restructuring already, and I can't find either llog_origin_handle_create() or "operation failed" in the current code.

            I'm happy to land a patch to quiet the spammy messages, but the unified target patch is 1500 lines, and at this point we are down to the last few blockers, and trying to limit the size/complexity of landed patches.

            adilger Andreas Dilger added a comment - Chris, what are the current spamming messages. The llog code has undergone some significant restructuring already, and I can't find either llog_origin_handle_create() or "operation failed" in the current code. I'm happy to land a patch to quiet the spammy messages, but the unified target patch is 1500 lines, and at this point we are down to the last few blockers, and trying to limit the size/complexity of landed patches.

            Something needs to happen before 2.4.0 to stop the console error message spam.

            morrone Christopher Morrone (Inactive) added a comment - Something needs to happen before 2.4.0 to stop the console error message spam.
            pjones Peter Jones added a comment -

            James

            Indeed. The senior engineers involved are entirely focused on the remaining issues blocking 2.4. This work is expected to be brought to completion for 2.5

            Peter

            pjones Peter Jones added a comment - James Indeed. The senior engineers involved are entirely focused on the remaining issues blocking 2.4. This work is expected to be brought to completion for 2.5 Peter

            This patch; http://review.whamcloud.com/4826; seems to have gone into limbo.

            simmonsja James A Simmons added a comment - This patch; http://review.whamcloud.com/4826 ; seems to have gone into limbo.

            this patch solves initial problem with spamming error messages. And it is related to UT at the same time.

            tappro Mikhail Pershin added a comment - this patch solves initial problem with spamming error messages. And it is related to UT at the same time.

            Please confirm that there is no defect outstanding and this is just the Unified Target patch being landed.

            jlevi Jodi Levi (Inactive) added a comment - Please confirm that there is no defect outstanding and this is just the Unified Target patch being landed.

            http://review.whamcloud.com/4826

            combined patch to fix initial problem. All service handling is kept in MGS for now, but unified request handler is used to handle MGS request.

            tappro Mikhail Pershin added a comment - http://review.whamcloud.com/4826 combined patch to fix initial problem. All service handling is kept in MGS for now, but unified request handler is used to handle MGS request.

            move mgs ptlrpc services into unified server http://review.whamcloud.com/#change,4715
            It doesn't solve original problem, just prepare things, next patch will do all MGS request handlers in unified way and fix this issue.

            tappro Mikhail Pershin added a comment - move mgs ptlrpc services into unified server http://review.whamcloud.com/#change,4715 It doesn't solve original problem, just prepare things, next patch will do all MGS request handlers in unified way and fix this issue.

            People

              tappro Mikhail Pershin
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: