Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17190

Client-side high priority I/O handling under lock blocking AST

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      We found a deadlock caused by parallel DIO:

      T1: writer
      Obtain DLM extent lock: L1=PW[0, EOF]
      T2: DIO reader: 50M data, iosize=64M, max_pages_per_rpc=1024 (4M) max_rpcs_in_flight=8
      ll_direct_IO_impl()
      use all available RPC slots: number of read RPC in flight is 9
      on the server side:
      ->tgt_brw_read()
      ->tgt_brw_lock() # server side locking
      -> Try to cancel the conflict locks on client: L1=PW[0, EOF]
      T3: reader
      take DLM lock ref on L1=PW[0, EOF]
      Read-ahead pages (prepare pages);
      wait for RPC slots to send the read RPCs to OST
      deadlock: T2->T3: T2 is waiting for T3 to release DLM extent lock L1;
      T3->T2: T3 is waiting for T2 finished to free RPC slots...
      

       

       

      To solve this problem, we propose a client-side high priority I/O where the extent lock protecting it is under blocking AST.

      It implements as follows:

      When receive a lock blocking AST and the lock is in use (reader and writer count are not zero), it check whether there are any I/O extent (osc_extent) protected by this lock is outstanding (i.e. waiting for RPC slot). Make this kind of read/write I/O with high priority and put them to the HP list. Thus the client will force to send the HP I/Os even the available RPC slots is use out.

      By this way, it makes I/O engine on OSC layer more efficient. For the normal urgent I/O, the client will tier over the object list one by one and send I/O one by one. Moreover, the in-flight I/O count can not exceed the max RPCs in flight.

      The hight priority I/Os are put into HP list of the client, will handle more quickly.

      It can avoid the possible deadlock caused by parallel DIO and response the lock blocking AST more quickly.

      Attachments

        Activity

          People

            qian_wc Qian Yingjin
            qian_wc Qian Yingjin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: