Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2145

The llog_origin_handle_create operation failed with -2

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • Orion
    • 2852

    Description

      When running IOR's writing to our Orion test system, I am seeing many Lustre error messages of the form:

      Jan  4 16:24:52 zwicky142 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:52 zwicky146 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky150 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky125 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky123 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky151 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky115 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:54 zwicky147 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      

      These are all messages from the clients, with 10.1.1.211 being the address of our MDS.

      The clients are running:

      $ rpm -qa | grep lustre-modules
      lustre-modules-2.1.0-15chaos_2.6.32_220.1chaos.ch5.x86_64.x86_64
      

      And the servers:

      # zwicky1 /root > rpm -qa | grep lustre-orion-modules
      lustre-orion-modules-2.2.49.50-2chaos_2.6.32_220.2.1.1chaos.ch5.x86_64.x86_64
      

      Attachments

        Issue Links

          Activity

            [LU-2145] The llog_origin_handle_create operation failed with -2

            http://review.whamcloud.com/4826

            combined patch to fix initial problem. All service handling is kept in MGS for now, but unified request handler is used to handle MGS request.

            tappro Mikhail Pershin added a comment - http://review.whamcloud.com/4826 combined patch to fix initial problem. All service handling is kept in MGS for now, but unified request handler is used to handle MGS request.

            move mgs ptlrpc services into unified server http://review.whamcloud.com/#change,4715
            It doesn't solve original problem, just prepare things, next patch will do all MGS request handlers in unified way and fix this issue.

            tappro Mikhail Pershin added a comment - move mgs ptlrpc services into unified server http://review.whamcloud.com/#change,4715 It doesn't solve original problem, just prepare things, next patch will do all MGS request handlers in unified way and fix this issue.

            Chris, still in progress, adapting Brian patch to the master, trying to keep it small and don't touch MDS side as it is critical for DNE landing. I expect to push patch to the gerrit in few days.

            tappro Mikhail Pershin added a comment - Chris, still in progress, adapting Brian patch to the master, trying to keep it small and don't touch MDS side as it is critical for DNE landing. I expect to push patch to the gerrit in few days.

            How is it going, Mike?

            morrone Christopher Morrone (Inactive) added a comment - How is it going, Mike?

            Mike, is the original problem in this bug fixed, and you are currently working on unified target code cleanup? If that is the case, I'd like to reduce the severity of this bug from being a 2.4.0 blocker.

            adilger Andreas Dilger added a comment - Mike, is the original problem in this bug fixed, and you are currently working on unified target code cleanup? If that is the case, I'd like to reduce the severity of this bug from being a 2.4.0 blocker.

            It takes more time than expected. I am trying to do that in correct way, first move ptlrpc services start/stop to the target and then handling request processing in common way. Estimated date for first part completion is next week.

            tappro Mikhail Pershin added a comment - It takes more time than expected. I am trying to do that in correct way, first move ptlrpc services start/stop to the target and then handling request processing in common way. Estimated date for first part completion is next week.

            sorry, I've missed last discussion here. No, it is far from being closed, these two patches are just preparation for more patches, they will be pushed in couple days.

            tappro Mikhail Pershin added a comment - sorry, I've missed last discussion here. No, it is far from being closed, these two patches are just preparation for more patches, they will be pushed in couple days.

            I've pulled the first one into our tree and will make sure the message is really gone.

            morrone Christopher Morrone (Inactive) added a comment - I've pulled the first one into our tree and will make sure the message is really gone.
            ian Ian Colle (Inactive) added a comment - - edited Patches http://review.whamcloud.com/#change,4258 and http://review.whamcloud.com/#change,4273 both landed. Can it be closed now?
            pjones Peter Jones added a comment -

            It's ok Mike - anyone trying to use the original ORI number will find it resolves to this ticket in its new location

            pjones Peter Jones added a comment - It's ok Mike - anyone trying to use the original ORI number will find it resolves to this ticket in its new location

            Can I move this ticket to the Lustre project or someone needs it with historical ORI-455 ID?

            tappro Mikhail Pershin added a comment - Can I move this ticket to the Lustre project or someone needs it with historical ORI-455 ID?

            People

              tappro Mikhail Pershin
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: