Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15118

There isn't any free thread to process resend request

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Open request is processed but the reply was lost:

      00010000:00010000:0.0:1602492290.594418:0:14205:0:(ldlm_lockd.c:1465:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=0, rc=0) ns: mdt-lustre-MDT0002_UUID lock: ffff8c0a8d93cb40/0xecdbb64c9917086c lrc: 3/0,0 mode: CR/CR res: [0x280000be0:0xea4:0x0].0x0 bits 0x9/0x0 rrc: 2 type: IBT gid 0 flags: 0x40200000000000 nid: 192.168.3.8@tcp remote: 0xfb5bd69624c1a4d4 expref: 881 pid: 14205 timeout: 0 lvb_type: 0
      00010000:00010000:0.0:1602492290.594434:0:14205:0:(ldlm_lockd.c:1544:ldlm_handle_enqueue0()) ### server-side enqueue handler END (lock ffff8c0a8d93cb40, rc 0)
      00010000:00000200:0.0:1602492290.594440:0:14205:0:(ldlm_lib.c:2989:target_send_reply_msg()) @@@ sending reply  req@ffff8c0a93630d80 x1680332092323520/t8590872076(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:25/0 lens 808/680 e 0 to 0 dl 1602492351 ref 1 fl Interpret:/0/0 rc 0/0 job:'cp.0'
      00000100:00100000:0.0:1602492290.594465:0:14205:0:(service.c:2278:ptlrpc_server_handle_request()) Handled RPC req@ffff8c0a93630d80 pname:cluuid+ref:pid:xid:nid:opc:job mdt00_039:649f59bb-d9ab-a7a1-b0ba-b64b3f540924+882:15694:x1680332092323520:12345-192.168.3.8@tcp:101:cp.0 Request processed in 56792us (56898us total) trans 8590872076 rc 0/0
      
      00000100:00100000:1.0:1602492353.522406:0:14193:0:(service.c:2075:ptlrpc_server_handle_req_in()) got req x1680332092323520
        req@ffff8c0b83694d80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:27/0 lens 808/0 e 0 to 0 dl 1602492414 ref 2 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
      
      00000100:00100000:1.0:1602492415.536875:0:14178:0:(service.c:2075:ptlrpc_server_handle_req_in()) got req x1680332092323520
      00000100:00080000:1.1:1602492415.536881:0:14178:0:(service.c:1628:ptlrpc_server_check_resend_in_progress()) @@@ Found duplicate req in processing  req@ffff8c0a91752400 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:28/0 lens 808/0 e 0 to 0 dl 1602492476 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
      00000100:00080000:1.1:1602492415.536888:0:14178:0:(service.c:1629:ptlrpc_server_check_resend_in_progress()) @@@ Request being processed  req@ffff8c0b83694d80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:27/0 lens 808/0 e 0 to 0 dl 1602492414 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
      
      00000100:00100000:1.0:1602492477.550365:0:14178:0:(service.c:2075:ptlrpc_server_handle_req_in()) got req x1680332092323520
      00000100:00080000:1.1:1602492477.550388:0:14178:0:(service.c:1628:ptlrpc_server_check_resend_in_progress()) @@@ Found duplicate req in processing  req@ffff8c0a8e7f7a80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:29/0 lens 808/0 e 0 to 0 dl 1602492538 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
      00000100:00080000:1.1:1602492477.550398:0:14178:0:(service.c:1629:ptlrpc_server_check_resend_in_progress()) @@@ Request being processed  req@ffff8c0b83694d80 x1680332092323520/t0(0) o101->649f59bb-d9ab-a7a1-b0ba-b64b3f540924@192.168.3.8@tcp:27/0 lens 808/0 e 0 to 0 dl 1602492476 ref 1 fl New:/2/ffffffff rc 0/-1 job:'cp.0'
      

      File unlink is blocked by open in resend state:

      _schedule
      schedule
      schedule_timeout
      ldlm_completion_ast
      ldlm_cli_enqueue_local
      mdt_object_local_lock
      mdt_object_lock_internal
      mdt_reint_object_lock
      mdt_reint_striped_lock
      mdt_reint_unlink
      mdt_reint_rec
      mdt_reint_internal
      mdt_reint
      tgt_request_handle
      ptlrpc_server_handle_request
      ptlrpc_main
      kthread
      Progs:  14668 "mdt00_060"
      

      All other mdt threads are are waiting for getattr on the parent directory:

      __schedule
      schedule
      schedule_timeout
      ldlm_completion_ast
      ldlm_cli_enqueue_local
      mdt_object_local_lock
      mdt_object_lock_internal
      mdt_getattr_name_lock
      mdt_intent_getattr
      mdt_intent_opc
      mdt_intent_policy
      ldlm_lock_enqueue
      ldlm_handle_enqueue0
      tgt_enqueue
      tgt_request_handle
      ptlrpc_server_handle_request
      ptlrpc_main
      kthread
      Progs:  13713 "mdt00_000" 13714 "mdt00_001" 13715 "mdt00_002" 14101 "mdt00_003" 14155 "mdt00_004" 14156 "mdt00_005" 14157 "mdt00_006" 14158 "mdt00_007" 14160 "mdt00_008" 14161 "mdt00_009" 14162 "mdt00_010" 14163 "mdt00_011" 14165 "mdt00_012" 14166 "mdt00_013" 14170 "mdt00_014" 14171 "mdt00_015" 14172 "mdt00_016" 14173 "mdt00_017" 14174 "mdt00_018" 14175 "mdt00_019" 14176 "mdt00_020" 14177 "mdt00_021" 14179 "mdt00_023" 14180 "mdt00_024" 14181 "mdt00_025" 14182 "mdt00_026" 14183 "mdt00_027" 14184 "mdt00_028" 14189 "mdt00_029" 14190 "mdt00_030" 14191 "mdt00_031" 14192 "mdt00_032" 14194 "mdt00_034" 14195 "mdt00_035" 14196 "mdt00_036" 14203 "mdt00_037" 14204 "mdt00_038" 14205 "mdt00_039" 14206 "mdt00_040" 14210 "mdt00_041" 14211 "mdt00_042" 14212 "mdt00_043" 14213 "mdt00_044" 14214 "mdt00_045" 14215 "mdt00_046" 14216 "mdt00_047" 14217 "mdt00_048" 14655 "mdt00_049" 14656 "mdt00_050" 14658 "mdt00_051" 14659 "mdt00_052" 14660 "mdt00_053" 14661 "mdt00_054" 14662 "mdt00_055" 14663 "mdt00_056" 14664 "mdt00_057" 14665 "mdt00_058" 14667 "mdt00_059" 14669 "mdt00_061" 14670 "mdt00_062" 14671 "mdt00_063" 14672 "mdt00_064" 14673 "mdt00_065" 14674 "mdt00_066" 14675 "mdt00_067" 14676 "mdt00_068" 14677 "mdt00_069" 14678 "mdt00_070" 14679 "mdt00_071" 14680 "mdt00_072" 14700 "mdt00_073" 14701 "mdt00_074" 14702 "mdt00_075" 14703 "mdt00_076" 14704 "mdt00_077" 14705 "mdt00_078" 14706 "mdt00_079"
      

      Attachments

        Issue Links

          Activity

            People

              askulysh Andriy Skulysh
              askulysh Andriy Skulysh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: