Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18154

Client can permanently hang in lustre recovery on race condition in replay_locks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.0, Lustre 2.15.6
    • Amazon Linux 2 4.14 or 5.10 kernel for server and client, but the problem is not linux version specific, it impacts all versions.
    • 3
    • 9223372036854775807

    Description

      LU-14027 resolved some cases where Lustre recovery (on a client) can fail to progress due to race conditions that prevent replay lock RPC requests from being processed when the server import state changes during recovery on a client. However, a race condition remains where the following sequence of events still hangs recovery:

      T1: A lock recovery RPC is added added to the set_new_requests to be processed while the import state is in REPLAY_LOCKS, and increments the counter for replay requests in flight imp->imp_replay_inflight.

      T2: An unrelated in-flight request fails (ptlrpc_replay_interpret) and calls ptlrpc_connect_import, setting import state to CONNECTING to reinitiate the reconnect process with the server.

      T3: A thread processing the RPCs in set_new_requests to send them to the server (in ptlrpc_check_set) evaluates the request from T1 and evaluates that the request state of REPLAY_LOCKS is not the same as the server import state of CONNECTING, and so moves the RPC to a delay list for RPCs to be sent later (imp->imp_delayed_list).

      T4: The state machine for moving a server import through the stages of recovery (CONNECTING->REPLAY->REPLAY_LOCKS->...>FULL) cannot move the import into REPLAY_LOCKS state because a check to imp>imp_replay_inflight > 0 prevents this due to T1. The RPC from T1 can never be sent and recovery process can never progress.

      Debug from client impacted, demonstrating the timeline above

      grep "lock replay thread kp64dbev-OST0002-osc-ffff91e61d14d800" lustre_debug.txt A 100000 -B 100 | grep "3261117|kp64dbev-OST0002_UUID: changing import|REPLAY_LOCKS != CONNECTING) req@00000000c2780d3d x1804541258909120/t0(0) o101>kp64dbev-OST0002"
      00000100:00080000:19.0:1721359050.645983:0:9860:0:(import.c:83:import_set_state_nolock()) 00000000f6a15738 kp64dbev-OST0002_UUID: changing import state from CONNECTING to REPLAY
      00000100:00080000:19.0:1721359050.645985:0:9860:0:(import.c:83:import_set_state_nolock()) 00000000f6a15738 kp64dbev-OST0002_UUID: changing import state from REPLAY to REPLAY_LOCKS
      00010000:00080000:76.0:1721359050.646033:0:3261117:0:(ldlm_request.c:2438:ldlm_lock_replay_thread()) lock replay thread kp64dbev-OST0002-osc-ffff91e61d14d800 to kp64dbev-OST0002_UUID@10.200.159.15@tcp
      00010000:00080000:76.0:1721359050.699193:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 1(8)
      00010000:00080000:76.0:1721359050.699197:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 2(8)
      00010000:00080000:76.0:1721359050.699199:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 3(8)
      00010000:00080000:76.0:1721359050.699201:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 4(8)
      00010000:00080000:76.0:1721359050.699203:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 5(8)
      00010000:00080000:76.0:1721359050.699204:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 6(8)
      00010000:00080000:76.0:1721359050.699206:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 7(8)
      00010000:00080000:76.0:1721359050.699207:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 8(8)
      00010000:00080000:76.0:1721359050.699208:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 8(8)
      00010000:00080000:76.0:1721359348.930423:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 7(8)
      00010000:00080000:76.0:1721359348.930432:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 8(8)
      00010000:00080000:76.0:1721359348.930432:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 8(8)
      00000100:00080000:89.0:1721359348.948236:0:9860:0:(import.c:83:import_set_state_nolock()) 00000000f6a15738 kp64dbev-OST0002_UUID: changing import state from REPLAY_LOCKS to CONNECTING
      00000100:00080000:89.0:1721359348.948317:0:9860:0:(client.c:1610:ptlrpc_send_new_req()) @@@ req waiting for recovery: (REPLAY_LOCKS != CONNECTING) req@00000000c2780d3d x1804541258909120/t0(0) o101->kp64dbev-OST0002-osc-ffff91e61d14d800@10.200.159.15@tcp:28/4 lens 328/400 e 0 to 0 dl 0 ref 1 fl Rpc:W/40/ffffffff rc 0/-1
      00010000:00080000:77.0:1721359348.948487:0:3261117:0:(ldlm_request.c:2380:lock_can_replay()) check lock replay limit, inflights = 7(8)
      00000100:00080000:89.0:1721359348.961445:0:9860:0:(import.c:83:import_set_state_nolock()) 00000000f6a15738 kp64dbev-OST0002_UUID: changing import state from CONNECTING to REPLAY

      I will propose a patch that modifies T3 to cancel any replay lock request, rather than to delay it, if the server import is in any state other than REPLAY_LOCKS. This decrements imp->imp_replay_inflight and allows the import recovery to restart from the beginning and proceed through all the states needed for recovery to succeed.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              ropermar Mark Roper
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: