Details

    • 3
    • 9749

    Description

      Recovery was aborted for on an OST because it was taking too long (possibly due to LU-1352). Recovery never completed for one OST. The 'lctl abort_recovery --device=3' process was hung with the following backtrace.

      PID: 25072  TASK: ffff880311ab2080  CPU: 9   COMMAND: "lctl"
       #0 [ffff880311b5fae8] schedule at ffffffff814eeee0
       #1 [ffff880311b5fbb0] schedule_timeout at ffffffff814efd95
       #2 [ffff880311b5fc60] wait_for_common at ffffffff814efa13
       #3 [ffff880311b5fcf0] wait_for_completion at ffffffff814efb2d
       #4 [ffff880311b5fd00] target_stop_recovery_thread at ffffffffa063b360 [ptlrpc]
       #5 [ffff880311b5fd20] filter_iocontrol at ffffffffa0b07ceb [obdfilter]
       #6 [ffff880311b5fd90] class_handle_ioctl at ffffffffa0509c37 [obdclass]
       #7 [ffff880311b5fe40] obd_class_ioctl at ffffffffa04fa21b [obdclass]
       #8 [ffff880311b5fe60] vfs_ioctl at ffffffff8118ab72
       #9 [ffff880311b5fea0] do_vfs_ioctl at ffffffff8118ad14
      #10 [ffff880311b5ff30] sys_ioctl at ffffffff8118b291
      #11 [ffff880311b5ff80] system_call_fastpath at ffffffff8100b0f2
      

      Also, tgt_recov backtrace:

      PID: 23416  TASK: ffff88033603b500  CPU: 15  COMMAND: "tgt_recov"
       #0 [ffff880310bd38f0] schedule at ffffffff814eeee0
       #1 [ffff880310bd39b8] schedule_timeout at ffffffff814efd12
       #2 [ffff880310bd3a68] cfs_waitq_timedwait at ffffffffa0422521 [libcfs]
       #3 [ffff880310bd3a78] target_bulk_io at ffffffffa0641ea0 [ptlrpc]
       #4 [ffff880310bd3b48] ost_brw_write at ffffffffa0abc21b [ost]
       #5 [ffff880310bd3cb8] ost_handle at ffffffffa0abf0e8 [ost]
       #6 [ffff880310bd3de8] handle_recovery_req at ffffffffa063bcac [ptlrpc]
       #7 [ffff880310bd3e28] target_recovery_thread at ffffffffa063c0b8 [ptlrpc]
       #8 [ffff880310bd3f48] kernel_thread at ffffffff8100c14a
      

      Also attaching complete 'foreach bt' output from crash.

      LLNL-bugzilla-ID: 1607

      Attachments

        Activity

          [LU-1368] lctl abort_recovery deadlocked

          Hi, Mikhail

          Is the patch still needed, and to be updated against master? Thanks!

          hongchao.zhang Hongchao Zhang added a comment - Hi, Mikhail Is the patch still needed, and to be updated against master? Thanks!

          the main problem here is if the LNetPut or LNetGet cost much long time(longer than the srv_at_estimate) to transfer data,
          the client still can see AT reply and continue to wait instead of failing the request.

          hongchao.zhang Hongchao Zhang added a comment - the main problem here is if the LNetPut or LNetGet cost much long time(longer than the srv_at_estimate) to transfer data, the client still can see AT reply and continue to wait instead of failing the request.

          This looks like just another LU-1166 case. About replay request issue, I don't see issue here, why request is waiting and what it is waiting? The ptlrpc_at_send_early_reply() is about sending early reply it doesn't affect request processing time on server. Can you explain a bit more what problem do you see here?

          tappro Mikhail Pershin added a comment - This looks like just another LU-1166 case. About replay request issue, I don't see issue here, why request is waiting and what it is waiting? The ptlrpc_at_send_early_reply() is about sending early reply it doesn't affect request processing time on server. Can you explain a bit more what problem do you see here?
          hongchao.zhang Hongchao Zhang added a comment - the patch is tracked at http://review.whamcloud.com/#change,2720
          hongchao.zhang Hongchao Zhang added a comment - - edited

          the LU-1352 should be caused by negative recovery time, and this one is related to it but it could contain a new issue,
          will attach a possible patch for it.

          hongchao.zhang Hongchao Zhang added a comment - - edited the LU-1352 should be caused by negative recovery time, and this one is related to it but it could contain a new issue, will attach a possible patch for it.
          green Oleg Drokin added a comment -

          hm, negative recovery time.
          I saw a couple bugs like this.
          might be dup of LU-1166

          green Oleg Drokin added a comment - hm, negative recovery time. I saw a couple bugs like this. might be dup of LU-1166

          during recovery, the request will be kept forever if the server can't complete it,
          in ptlrpc_at_send_early_reply,
          ...
          if (req->rq_export &&
          lustre_msg_get_flags(req->rq_reqmsg) &
          (MSG_REPLAY | MSG_REQ_REPLAY_DONE | MSG_LOCK_REPLAY_DONE))

          { /* During recovery, we don't want to send too many early * replies, but on the other hand we want to make sure the * client has enough time to resend if the rpc is lost. So * during the recovery period send at least 4 early replies, * spacing them every at_extra if we can. at_estimate should * always equal this fixed value during recovery. */ at_measured(&svc->srv_at_estimate, min(at_extra, req->rq_export->exp_obd->obd_recovery_timeout / 4)); }

          else

          { ... }

          newdl = cfs_time_current_sec() + at_get(&svc->srv_at_estimate); <--- here, the new deadline is set.
          ...

          and in target_bulk_io, the request will be waited until it reaches the deadline of the request!

          _

          hongchao.zhang Hongchao Zhang added a comment - during recovery, the request will be kept forever if the server can't complete it, in ptlrpc_at_send_early_reply, ... if (req->rq_export && lustre_msg_get_flags(req->rq_reqmsg) & (MSG_REPLAY | MSG_REQ_REPLAY_DONE | MSG_LOCK_REPLAY_DONE)) { /* During recovery, we don't want to send too many early * replies, but on the other hand we want to make sure the * client has enough time to resend if the rpc is lost. So * during the recovery period send at least 4 early replies, * spacing them every at_extra if we can. at_estimate should * always equal this fixed value during recovery. */ at_measured(&svc->srv_at_estimate, min(at_extra, req->rq_export->exp_obd->obd_recovery_timeout / 4)); } else { ... } newdl = cfs_time_current_sec() + at_get(&svc->srv_at_estimate); <--- here, the new deadline is set. ... and in target_bulk_io, the request will be waited until it reaches the deadline of the request! _
          nedbass Ned Bass (Inactive) added a comment - - edited

          From the affected server's logs, note the negative recovery time.

          2012-05-03 13:24:54 Lustre: lsc-OST0132: Denying connection for new client 192.168.117.130@o2ib10 (at 9bbecaa5-b07c-320a-ef31-4620f1b26f3c), waiting for 7 clients in recovery for -12:-06
          

          This seems to support the comments in LU-1352. Time here is calculated as

          t = cfs_timer_deadline(&target->obd_recovery_timer);
          t = cfs_time_sub(t, cfs_time_current());

          So we must have cfs_time_current > target->obd_recovery_timer.

          nedbass Ned Bass (Inactive) added a comment - - edited From the affected server's logs, note the negative recovery time. 2012-05-03 13:24:54 Lustre: lsc-OST0132: Denying connection for new client 192.168.117.130@o2ib10 (at 9bbecaa5-b07c-320a-ef31-4620f1b26f3c), waiting for 7 clients in recovery for -12:-06 This seems to support the comments in LU-1352 . Time here is calculated as t = cfs_timer_deadline(&target->obd_recovery_timer); t = cfs_time_sub(t, cfs_time_current()); So we must have cfs_time_current > target->obd_recovery_timer.
          pjones Peter Jones added a comment -

          Hi Hongchao

          Could you please look into this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Hi Hongchao Could you please look into this one? Thanks Peter

          People

            hongchao.zhang Hongchao Zhang
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: