Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2145

The llog_origin_handle_create operation failed with -2

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.5.0
    • Lustre 2.4.0, Lustre 2.5.0
    • 3
    • Orion
    • 2852

    Description

      When running IOR's writing to our Orion test system, I am seeing many Lustre error messages of the form:

      Jan  4 16:24:52 zwicky142 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:52 zwicky146 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky150 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky125 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky123 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky151 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:53 zwicky115 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      Jan  4 16:24:54 zwicky147 kernel: LustreError: 11-0: an error occurred while communicating with 10.1.1.211@o2ib9. The llog_origin_handle_create operation failed with -2
      

      These are all messages from the clients, with 10.1.1.211 being the address of our MDS.

      The clients are running:

      $ rpm -qa | grep lustre-modules
      lustre-modules-2.1.0-15chaos_2.6.32_220.1chaos.ch5.x86_64.x86_64
      

      And the servers:

      # zwicky1 /root > rpm -qa | grep lustre-orion-modules
      lustre-orion-modules-2.2.49.50-2chaos_2.6.32_220.2.1.1chaos.ch5.x86_64.x86_64
      

      Attachments

        Issue Links

          Activity

            [LU-2145] The llog_origin_handle_create operation failed with -2

            It takes more time than expected. I am trying to do that in correct way, first move ptlrpc services start/stop to the target and then handling request processing in common way. Estimated date for first part completion is next week.

            tappro Mikhail Pershin added a comment - It takes more time than expected. I am trying to do that in correct way, first move ptlrpc services start/stop to the target and then handling request processing in common way. Estimated date for first part completion is next week.

            sorry, I've missed last discussion here. No, it is far from being closed, these two patches are just preparation for more patches, they will be pushed in couple days.

            tappro Mikhail Pershin added a comment - sorry, I've missed last discussion here. No, it is far from being closed, these two patches are just preparation for more patches, they will be pushed in couple days.

            I've pulled the first one into our tree and will make sure the message is really gone.

            morrone Christopher Morrone (Inactive) added a comment - I've pulled the first one into our tree and will make sure the message is really gone.
            ian Ian Colle (Inactive) added a comment - - edited Patches http://review.whamcloud.com/#change,4258 and http://review.whamcloud.com/#change,4273 both landed. Can it be closed now?
            pjones Peter Jones added a comment -

            It's ok Mike - anyone trying to use the original ORI number will find it resolves to this ticket in its new location

            pjones Peter Jones added a comment - It's ok Mike - anyone trying to use the original ORI number will find it resolves to this ticket in its new location

            Can I move this ticket to the Lustre project or someone needs it with historical ORI-455 ID?

            tappro Mikhail Pershin added a comment - Can I move this ticket to the Lustre project or someone needs it with historical ORI-455 ID?

            Mike the following patch fully implements the MDT style ptlrpc handlers for the MGS. In the process of implementing that support the console warning which prompted me to file this issue are fixed.

            http://review.whamcloud.com/3055

            behlendorf Brian Behlendorf added a comment - Mike the following patch fully implements the MDT style ptlrpc handlers for the MGS. In the process of implementing that support the console warning which prompted me to file this issue are fixed. http://review.whamcloud.com/3055

            Thanks! That should help me get started.

            prakash Prakash Surya (Inactive) added a comment - Thanks! That should help me get started.

            Right, err_serious() has the same means, see mdt_req_handle(), it is common handler which calls proper operation handler and gets 'rc' back and there is no way to know the type of 'rc', check comments there:

                    if (likely(rc == 0)) {
                            /*
                             * Process request, there can be two types of rc:
                             * 1) errors with msg unpack/pack, other failures outside the
                             * operation itself. This is counted as serious errors;
                             * 2) errors during fs operation, should be placed in rq_status
                             * only
                             */
                            rc = h->mh_act(info);
                            if (rc == 0 &&
                                !req->rq_no_reply && req->rq_reply_state == NULL) {
                                    DEBUG_REQ(D_ERROR, req, "MDT \"handler\" %s did not "
                                              "pack reply and returned 0 error\n",
                                              h->mh_name);
                                    LBUG();
                            }
                            serious = is_serious(rc);
                            rc = clear_serious(rc);
            

            Such technique can be used for mgs_handler as well, if it calls function which does not pure processing but also pack/unpack work and other sanity checks of RPC consistency.

            tappro Mikhail Pershin added a comment - Right, err_serious() has the same means, see mdt_req_handle(), it is common handler which calls proper operation handler and gets 'rc' back and there is no way to know the type of 'rc', check comments there: if (likely(rc == 0)) { /* * Process request, there can be two types of rc: * 1) errors with msg unpack/pack, other failures outside the * operation itself. This is counted as serious errors; * 2) errors during fs operation, should be placed in rq_status * only */ rc = h->mh_act(info); if (rc == 0 && !req->rq_no_reply && req->rq_reply_state == NULL) { DEBUG_REQ(D_ERROR, req, "MDT \" handler\ " %s did not " "pack reply and returned 0 error\n" , h->mh_name); LBUG(); } serious = is_serious(rc); rc = clear_serious(rc); Such technique can be used for mgs_handler as well, if it calls function which does not pure processing but also pack/unpack work and other sanity checks of RPC consistency.

            In the "good old days" (before err_serious existed) the rule was that if there was an error processing the RPC itself (e.g. corruption in the request, badly-formed RPC message buffers, bad opcode, etc) then the request handler would return a non-zero error (that eventually is stored in rq_status/pb_status) and rq_type = PTL_RPC_MSG_ERR is returned (via ptlrpc_send_error()) and this would result in a (hopefully rare) console message.

            If there was an error while executing the request (e.g. permission error on a file, ENOMEM/ENOSPC/... in the filesystem, etc) then the operation error is stored in rq_status and returns 0 to the RPC handler code, which set the rq_type is PTL_RPC_MSG_REPLY (via ptlrpc_send_reply()), with nothing printed on the console. In some cases, code got this wrong and incorrectly caused console messages to be printed (which should be considered a bug to be fixed).

            I don't know the details of how error_serious() factors into this, but maybe this gives you some background. Hopefully Alex or Mike can chime in on the details.

            adilger Andreas Dilger added a comment - In the "good old days" (before err_serious existed) the rule was that if there was an error processing the RPC itself (e.g. corruption in the request, badly-formed RPC message buffers, bad opcode, etc) then the request handler would return a non-zero error (that eventually is stored in rq_status/pb_status) and rq_type = PTL_RPC_MSG_ERR is returned (via ptlrpc_send_error()) and this would result in a (hopefully rare) console message. If there was an error while executing the request (e.g. permission error on a file, ENOMEM/ENOSPC/... in the filesystem, etc) then the operation error is stored in rq_status and returns 0 to the RPC handler code, which set the rq_type is PTL_RPC_MSG_REPLY (via ptlrpc_send_reply()), with nothing printed on the console. In some cases, code got this wrong and incorrectly caused console messages to be printed (which should be considered a bug to be fixed). I don't know the details of how error_serious() factors into this, but maybe this gives you some background. Hopefully Alex or Mike can chime in on the details.

            I'm looking at making a patch for this issue, but I'm a little confused as to when err_serious() should be used and when it shouldn't. Can anybody provide me with some background info as to when and why err_serious should be used? I'm looking at the MDT handler code, but can't spot a pattern to it's usage there.

            prakash Prakash Surya (Inactive) added a comment - I'm looking at making a patch for this issue, but I'm a little confused as to when err_serious() should be used and when it shouldn't. Can anybody provide me with some background info as to when and why err_serious should be used? I'm looking at the MDT handler code, but can't spot a pattern to it's usage there.

            People

              tappro Mikhail Pershin
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: