Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12347

lustre write: do not enqueue rpc holding osc/mdc ldlm lock held

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      lustre’s write should not send enqueue rpc to mds while having osc or mdc ldlm lock held. This may happen currently via:

          cl_io_loop
            cl_io_lock                    <- ldlm lock is taken here
            cl_io_start
              vvp_io_write_start
              ...
                __generic_file_aio_write
                  file_remove_privs
                    security_inode_need_killpriv
                    ...
                      ll_xattr_get_common
                      ...
                        mdc_intent_lock   <- enqueue rpc is sent here
            cl_io_unlock                  <- ldlm lock is released
      

      That may lead to client eviction. The following scenario has been observed during write load with DoM involved:

      • write holds mdc ldlm lock (L1) and is waiting on free rpc slot in
        obd_get_request_slot trying to do ll_xattr_get_common().
      • all the rpc slots are busy by write processes which wait for enqueue
        rpc completion.
      • mds in order to serve the enqueue requests has sent blocking ast for
        the lock L1 and eventually evicts the client as it does not cancel
        L1.

      There has been observed another more complex scenario caused by this problem. Clients get evicted by osts during mdtest+ior+failover hw testing.

      Attachments

        Issue Links

          Activity

            People

              vsaveliev Vladimir Saveliev
              vsaveliev Vladimir Saveliev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: