Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3645

Interop 2.1.5 <--> 2.4 Write operations during failover errors out instead of stalling

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.0, Lustre 2.1.5
    • 3
    • 9382

    Description

      During acceptance testing for KIT, they tried OSS failover while running several applications. And applications got IO errors (can't create file and similar messages). This should not happen and IO should just stall till failover happens.
      The clients were running 2.4 and servers were 2.1.5. We tried with 2.1.5 clients and did not see this issue. I have attached the client and server logs.

      Attachments

        1. client_lctl_dk_20130911.tgz
          18 kB
        2. client_messages_20130911.tgz
          228 kB
        3. ll10987.out.gz
          0.2 kB
        4. LU-XXXX.tgz
          463 kB
        5. mds1.llog.gz
          224 kB
        6. mds2.llog.gz
          0.2 kB
        7. server_lctl_dk_20130911.tgz
          392 kB
        8. ucbn003.localdomain.llog.gz
          0.2 kB

        Activity

          [LU-3645] Interop 2.1.5 <--> 2.4 Write operations during failover errors out instead of stalling

          Hi Kit,

          the master has waited the object precreation in "osp_declare_object_create"/"osp_precreate_reserve"/"osp_precreate_ready_condition" for recovering case

          static int osp_precreate_ready_condition(const struct lu_env *env,
                                                   struct osp_device *d)
          {
                  if (d->opd_pre_recovering)
                          return 0;
          
                  /* ready if got enough precreated objects */
                  /* we need to wait for others (opd_pre_reserved) and our object (+1) */
                  if (d->opd_pre_reserved + 1 < osp_objs_precreated(env, d))
                          return 1;
          
                  /* ready if OST reported no space and no destroys in progress */
                  if (d->opd_syn_changes + d->opd_syn_rpc_in_progress == 0 &&
                      d->opd_pre_status == -ENOSPC)
                          return 1;
          
                  /* Bail out I/O fails to OST */
                  if (d->opd_pre_status == -EIO)
                          return 1;
          
                  return 0;
          }
          
          hongchao.zhang Hongchao Zhang added a comment - Hi Kit, the master has waited the object precreation in "osp_declare_object_create"/"osp_precreate_reserve"/"osp_precreate_ready_condition" for recovering case static int osp_precreate_ready_condition( const struct lu_env *env, struct osp_device *d) { if (d->opd_pre_recovering) return 0; /* ready if got enough precreated objects */ /* we need to wait for others (opd_pre_reserved) and our object (+1) */ if (d->opd_pre_reserved + 1 < osp_objs_precreated(env, d)) return 1; /* ready if OST reported no space and no destroys in progress */ if (d->opd_syn_changes + d->opd_syn_rpc_in_progress == 0 && d->opd_pre_status == -ENOSPC) return 1; /* Bail out I/O fails to OST */ if (d->opd_pre_status == -EIO) return 1; return 0; }

          Hi Hongchao,

          Do you think something like this patch is necessary on master? The osc appears to be a lot different in that version.

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Hongchao, Do you think something like this patch is necessary on master? The osc appears to be a lot different in that version. Thanks, Kit

          Perfect! I confirmed with the customer that this patch fixes their issue, so once this lands, I guess we can close the ticket.

          kitwestneat Kit Westneat (Inactive) added a comment - Perfect! I confirmed with the customer that this patch fixes their issue, so once this lands, I guess we can close the ticket.
          hongchao.zhang Hongchao Zhang added a comment - - edited

          Okay, the patch has been updated to suppress this error message if the corresponding request failed.

          Thanks

          hongchao.zhang Hongchao Zhang added a comment - - edited Okay, the patch has been updated to suppress this error message if the corresponding request failed. Thanks

          Would it be possible to suppress the warning message in this case? I think it is going to confuse sysadmins.

          kitwestneat Kit Westneat (Inactive) added a comment - Would it be possible to suppress the warning message in this case? I think it is going to confuse sysadmins.

          this error message should not cause issues, for it only detected there was no "ost_body" in the replied message (server side) of the failed ost_create request.
          and 208 is just the size of "ost_body"

              if (field->rmf_flags & RMF_F_STRUCT_ARRAY) {
                  /*
                   * We've already asserted that field->rmf_size > 0 in
                   * req_layout_init().
                   */
                  len = lustre_msg_buflen(msg, offset);
                  if ((len % field->rmf_size) != 0) {
                      CERROR("%s: array field size mismatch "
                             "%d modulo %d != 0 (%d)\n",
                             field->rmf_name, len, field->rmf_size, loc);
                      return NULL;
                  }
              } else if (pill->rc_area[loc][offset] != -1) {
                  len = pill->rc_area[loc][offset];
              } else {
                  len = max(field->rmf_size, 0);
              }
              value = getter(msg, offset, len);
          
              if (value == NULL) {
                  DEBUG_REQ(D_ERROR, pill->rc_req,
                            "Wrong buffer for field `%s' (%d of %d) "
                            "in format `%s': %d vs. %d (%s)\n",
                            field->rmf_name, offset, lustre_msg_bufcount(msg),
                            fmt->rf_name, lustre_msg_buflen(msg, offset), len,
                            rcl_names[loc]);
          

          Yes, the two patches should be combined, and the patch has been updated.
          Thanks

          hongchao.zhang Hongchao Zhang added a comment - this error message should not cause issues, for it only detected there was no "ost_body" in the replied message (server side) of the failed ost_create request. and 208 is just the size of "ost_body" if (field->rmf_flags & RMF_F_STRUCT_ARRAY) { /* * We've already asserted that field->rmf_size > 0 in * req_layout_init(). */ len = lustre_msg_buflen(msg, offset); if ((len % field->rmf_size) != 0) { CERROR( "%s: array field size mismatch " "%d modulo %d != 0 (%d)\n" , field->rmf_name, len, field->rmf_size, loc); return NULL; } } else if (pill->rc_area[loc][offset] != -1) { len = pill->rc_area[loc][offset]; } else { len = max(field->rmf_size, 0); } value = getter(msg, offset, len); if (value == NULL) { DEBUG_REQ(D_ERROR, pill->rc_req, "Wrong buffer for field `%s' (%d of %d) " "in format `%s': %d vs. %d (%s)\n" , field->rmf_name, offset, lustre_msg_bufcount(msg), fmt->rf_name, lustre_msg_buflen(msg, offset), len, rcl_names[loc]); Yes, the two patches should be combined, and the patch has been updated. Thanks

          With both patchset 4 and 5, it seems to solve the issue:

          LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_create operation failed with -107
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          LustreError: 9266:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `ost_body' (1 of 1) in format `OST_CREATE': 0 vs. 208 (server)
            req@ffff880808005800 x1450889175112924/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383677590 ref 1 fl Interpret:RN/0/0 rc -107/-107
          LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_connect operation failed with -19
          LustreError: 9266:0:(osc_create.c:177:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc  req@ffff880808005800 x1450889175112924/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383677590 ref 1 fl Interpret:RN/0/0 rc -107/-107
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.3@o2ib)
          Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans
          LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The ost_statfs operation failed with -107
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib)
          Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans
          LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_statfs operation failed with -107
          LustreError: Skipped 1 previous similar message
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          LustreError: 9266:0:(osc_create.c:177:osc_interpret_create()) @@@ Unknown rc -19 from async create: failing oscc  req@ffff881001a70000 x1450889176260853/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/400 e 0 to 0 dl 1383678377 ref 1 fl Interpret:RN/0/0 rc -19/-19
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.3@o2ib)
          Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans
          LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The obd_ping operation failed with -107
          LustreError: Skipped 2 previous similar messages
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib)
          Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans
          

          This error message looks somewhat concerning:

          LustreError: 9266:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `ost_body' (1 of 1) in format `OST_CREATE': 0 vs. 208 (server)
          

          Do you know if this should cause issues?

          kitwestneat Kit Westneat (Inactive) added a comment - With both patchset 4 and 5, it seems to solve the issue: LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_create operation failed with -107 Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete LustreError: 9266:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `ost_body ' (1 of 1) in format `OST_CREATE' : 0 vs. 208 (server) req@ffff880808005800 x1450889175112924/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383677590 ref 1 fl Interpret:RN/0/0 rc -107/-107 LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_connect operation failed with -19 LustreError: 9266:0:(osc_create.c:177:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc req@ffff880808005800 x1450889175112924/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383677590 ref 1 fl Interpret:RN/0/0 rc -107/-107 Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.3@o2ib) Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The ost_statfs operation failed with -107 Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib) Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_statfs operation failed with -107 LustreError: Skipped 1 previous similar message Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete LustreError: 9266:0:(osc_create.c:177:osc_interpret_create()) @@@ Unknown rc -19 from async create: failing oscc req@ffff881001a70000 x1450889176260853/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/400 e 0 to 0 dl 1383678377 ref 1 fl Interpret:RN/0/0 rc -19/-19 Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.3@o2ib) Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The obd_ping operation failed with -107 LustreError: Skipped 2 previous similar messages Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib) Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans This error message looks somewhat concerning: LustreError: 9266:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `ost_body ' (1 of 1) in format `OST_CREATE' : 0 vs. 208 (server) Do you know if this should cause issues?

          Do you think that patchset 4 would fix this issue? It actually seems as if we are hitting 2 separate issues depending on where in the code path the OSS fails.

          kitwestneat Kit Westneat (Inactive) added a comment - Do you think that patchset 4 would fix this issue? It actually seems as if we are hitting 2 separate issues depending on where in the code path the OSS fails.

          Hi Hongchao,

          Still getting errors:

          Nov  4 21:33:56 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_create operation failed with -107
          Nov  4 21:33:56 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          Nov  4 21:33:56 pfscn2 kernel: : LustreError: 9244:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `ost_body' (1 of 1) in format `OST_CREATE': 0 vs. 208 (server)
          Nov  4 21:33:56 pfscn2 kernel: : req@ffff8808092fa000 x1450802648479850/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383597243 ref 1 fl Interpret:RN/0/0 rc -107/-107
          Nov  4 21:33:56 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_connect operation failed with -19
          Nov  4 21:33:56 pfscn2 kernel: : LustreError: 9244:0:(osc_create.c:177:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc  req@ffff8808092fa000 x1450802648479850/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383597243 ref 1 fl Interpret:RN/0/0 rc -107/-107
          Nov  4 21:33:56 pfscn2 kernel: : LustreError: 12037:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x1377d sub-object on OST idx 0/1: rc = -5
          Nov  4 21:34:48 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.3@o2ib)
          

          Are you able to reproduce the error? Basically we are doing it by having 8 clients writing small files, then failing over an OSS.

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Hongchao, Still getting errors: Nov 4 21:33:56 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_create operation failed with -107 Nov 4 21:33:56 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.4@o2ib) was lost; in progress operations using this service will wait for recovery to complete Nov 4 21:33:56 pfscn2 kernel: : LustreError: 9244:0:(layout.c:1659:__req_capsule_get()) @@@ Wrong buffer for field `ost_body ' (1 of 1) in format `OST_CREATE' : 0 vs. 208 (server) Nov 4 21:33:56 pfscn2 kernel: : req@ffff8808092fa000 x1450802648479850/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383597243 ref 1 fl Interpret:RN/0/0 rc -107/-107 Nov 4 21:33:56 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.4@o2ib. The ost_connect operation failed with -19 Nov 4 21:33:56 pfscn2 kernel: : LustreError: 9244:0:(osc_create.c:177:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc req@ffff8808092fa000 x1450802648479850/t0(0) o5->pfscdat2-OST0000-osc-MDT0000@172.26.17.4@o2ib:7/4 lens 400/192 e 0 to 0 dl 1383597243 ref 1 fl Interpret:RN/0/0 rc -107/-107 Nov 4 21:33:56 pfscn2 kernel: : LustreError: 12037:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x1377d sub-object on OST idx 0/1: rc = -5 Nov 4 21:34:48 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.3@o2ib) Are you able to reproduce the error? Basically we are doing it by having 8 clients writing small files, then failing over an OSS.

          the new patch is tracked at http://review.whamcloud.com/#/c/7559/

          hongchao.zhang Hongchao Zhang added a comment - the new patch is tracked at http://review.whamcloud.com/#/c/7559/

          People

            hongchao.zhang Hongchao Zhang
            gshilamkar Girish Shilamkar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: