Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5319

Support multiple slots per client in last_rcvd file

Details

    • 14856

    Description

      While running mdtest benchmark, I have observed that file creation and unlink operations from a single Lustre client quickly saturates to around 8000 iops: maximum is reached as soon as with 4 tasks in parallel.
      When using several Lustre mount points on a single client node, the file creation and unlink rate do scale with the number of tasks, up to the 16 cores of my client node.

      Looking at the code, it appears that most metadata operations are serialized by a mutex in the MDC layer.
      In mdc_reint() routine, request posting is protected by mdc_get_rpc_lock() and mdc_put_rpc_lock(), where the lock is :
      struct client_obd -> struct mdc_rpc_lock *cl_rpc_lock -> struct mutex rpcl_mutex.

      After an email discussion with Andreas Dilger, it appears that the limitation is actually on the MDS, since it cannot handle more than a single filesystem-modifying RPC at one time. There is only one slot in the MDT last_rcvd file for each client to save the state for the reply in case it is lost.

      The aim of this ticket is to implement multiple slots per client in the last_rcvd file so that several filesystem-modifying RPCs can be handled in parallel.

      The single client metadata performance should be significantly improved while still ensuring a safe recovery mecanism.

      Attachments

        Issue Links

          Activity

            [LU-5319] Support multiple slots per client in last_rcvd file

            The multi-slots implementation introduced a regression, see LU-5951.

            To get the unreplied requests by scan the sending/delayed list, current multi-slots implementation moved the xid assignment from request packing stage to request sending stage, however, that breaks the original mechanism which used to coordinate the timestamp update on OST objects (caused by some out of order operations, such as setattr, truncate and write).

            To fix this regression, LU-5951 moved the xid assignment back to request packing stage, and introduced an unreplied list to track all the unreplied requests. Following is a brief description of the LU-5951 patch:

            obd_import->imp_unreplied_list is introduced to track all the unreplied requests, and all requests in the list is sorted by xid, so that client may get the known maximal replied xid by checking the first element in the list.

            obd_import->imp_known_replied_xid is introduced for sanity check purpose, it's updated along with the imp_unreplied_list.

            Once a request is built, it'll be inserted into the unreplied list, and when the reply is seen by client or the request is going to be freed, the request will be removed from the list. Two tricky points are worth mentioning here:

            1. Replay requests need be added back to the unreplied list before sending, instead of adding them back one by one during replay, we choose to add them back all together before replay, that'll be easier for strict sanity check and less bug prone.

            2. The sanity check on server side is strengthened a lot, to satisfy the stricter check, connect & disconnect request won't carry the known replied xid anymore, see the comments in ptlrpc_send_new_req() for details.

            niu Niu Yawei (Inactive) added a comment - The multi-slots implementation introduced a regression, see LU-5951 . To get the unreplied requests by scan the sending/delayed list, current multi-slots implementation moved the xid assignment from request packing stage to request sending stage, however, that breaks the original mechanism which used to coordinate the timestamp update on OST objects (caused by some out of order operations, such as setattr, truncate and write). To fix this regression, LU-5951 moved the xid assignment back to request packing stage, and introduced an unreplied list to track all the unreplied requests. Following is a brief description of the LU-5951 patch: obd_import->imp_unreplied_list is introduced to track all the unreplied requests, and all requests in the list is sorted by xid, so that client may get the known maximal replied xid by checking the first element in the list. obd_import->imp_known_replied_xid is introduced for sanity check purpose, it's updated along with the imp_unreplied_list. Once a request is built, it'll be inserted into the unreplied list, and when the reply is seen by client or the request is going to be freed, the request will be removed from the list. Two tricky points are worth mentioning here: 1. Replay requests need be added back to the unreplied list before sending, instead of adding them back one by one during replay, we choose to add them back all together before replay, that'll be easier for strict sanity check and less bug prone. 2. The sanity check on server side is strengthened a lot, to satisfy the stricter check, connect & disconnect request won't carry the known replied xid anymore, see the comments in ptlrpc_send_new_req() for details.
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14861/
            Subject: LU-5319 tests: testcases for multiple modify RPCs feature
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c2d27a0f12688c0d029880919f8b002e557b540c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14861/ Subject: LU-5319 tests: testcases for multiple modify RPCs feature Project: fs/lustre-release Branch: master Current Patch Set: Commit: c2d27a0f12688c0d029880919f8b002e557b540c

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14862/
            Subject: LU-5319 utils: update lr_reader to display additional data
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6460ae59bf6e5175797dc66ecbe560eebc8b6333

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14862/ Subject: LU-5319 utils: update lr_reader to display additional data Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6460ae59bf6e5175797dc66ecbe560eebc8b6333

            Here is version 2 of the test plan document, updated with tests results.

            pichong Gregoire Pichon added a comment - Here is version 2 of the test plan document, updated with tests results.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14860/
            Subject: LU-5319 mdt: support multiple modify RCPs in parallel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5fc7aa3687daca5c14b0e479c58146e0987daf7f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14860/ Subject: LU-5319 mdt: support multiple modify RCPs in parallel Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5fc7aa3687daca5c14b0e479c58146e0987daf7f

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14374/
            Subject: LU-5319 mdc: manage number of modify RPCs in flight
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1fc013f90175d1e50d7a22b404ad6abd31a43e38

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14374/ Subject: LU-5319 mdc: manage number of modify RPCs in flight Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1fc013f90175d1e50d7a22b404ad6abd31a43e38

            People

              bzzz Alex Zhuravlev
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: