Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5319

Support multiple slots per client in last_rcvd file

Details

    • 14856

    Description

      While running mdtest benchmark, I have observed that file creation and unlink operations from a single Lustre client quickly saturates to around 8000 iops: maximum is reached as soon as with 4 tasks in parallel.
      When using several Lustre mount points on a single client node, the file creation and unlink rate do scale with the number of tasks, up to the 16 cores of my client node.

      Looking at the code, it appears that most metadata operations are serialized by a mutex in the MDC layer.
      In mdc_reint() routine, request posting is protected by mdc_get_rpc_lock() and mdc_put_rpc_lock(), where the lock is :
      struct client_obd -> struct mdc_rpc_lock *cl_rpc_lock -> struct mutex rpcl_mutex.

      After an email discussion with Andreas Dilger, it appears that the limitation is actually on the MDS, since it cannot handle more than a single filesystem-modifying RPC at one time. There is only one slot in the MDT last_rcvd file for each client to save the state for the reply in case it is lost.

      The aim of this ticket is to implement multiple slots per client in the last_rcvd file so that several filesystem-modifying RPCs can be handled in parallel.

      The single client metadata performance should be significantly improved while still ensuring a safe recovery mecanism.

      Attachments

        Issue Links

          Activity

            [LU-5319] Support multiple slots per client in last_rcvd file

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14861/
            Subject: LU-5319 tests: testcases for multiple modify RPCs feature
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c2d27a0f12688c0d029880919f8b002e557b540c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14861/ Subject: LU-5319 tests: testcases for multiple modify RPCs feature Project: fs/lustre-release Branch: master Current Patch Set: Commit: c2d27a0f12688c0d029880919f8b002e557b540c

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14862/
            Subject: LU-5319 utils: update lr_reader to display additional data
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6460ae59bf6e5175797dc66ecbe560eebc8b6333

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14862/ Subject: LU-5319 utils: update lr_reader to display additional data Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6460ae59bf6e5175797dc66ecbe560eebc8b6333

            Here is version 2 of the test plan document, updated with tests results.

            pichong Gregoire Pichon added a comment - Here is version 2 of the test plan document, updated with tests results.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14860/
            Subject: LU-5319 mdt: support multiple modify RCPs in parallel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5fc7aa3687daca5c14b0e479c58146e0987daf7f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14860/ Subject: LU-5319 mdt: support multiple modify RCPs in parallel Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5fc7aa3687daca5c14b0e479c58146e0987daf7f

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14374/
            Subject: LU-5319 mdc: manage number of modify RPCs in flight
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1fc013f90175d1e50d7a22b404ad6abd31a43e38

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14374/ Subject: LU-5319 mdc: manage number of modify RPCs in flight Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1fc013f90175d1e50d7a22b404ad6abd31a43e38

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14793/
            Subject: LU-5319 ptlrpc: embed highest XID in each request
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bf3e7f67cb33f3b4e0590ef8af3843ac53d0a4e8

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14793/ Subject: LU-5319 ptlrpc: embed highest XID in each request Project: fs/lustre-release Branch: master Current Patch Set: Commit: bf3e7f67cb33f3b4e0590ef8af3843ac53d0a4e8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14153/
            Subject: LU-5319 mdc: add max modify RPCs in flight variable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 60c05ea9f66f9bd3f5fd35942a12edb1e311c455

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14153/ Subject: LU-5319 mdc: add max modify RPCs in flight variable Project: fs/lustre-release Branch: master Current Patch Set: Commit: 60c05ea9f66f9bd3f5fd35942a12edb1e311c455

            Grégoire, it looks like your patches are regularly causing conf-sanity test_32[abd] to fail or time out. This is not the case with other patches being tested recently, so it looks like there is a regression in your patches 14860 (test fail) and 14861 (test timeout).

            One failure in 14860 is:

            LustreError: 14561:0:(class_obd.c:684:cleanup_obdclass()) obd_memory max: 53634235, leaked: 152
            shadow-10vm8: 
            shadow-10vm8: Memory leaks detected
            

            Could you please investigate.

            adilger Andreas Dilger added a comment - Grégoire, it looks like your patches are regularly causing conf-sanity test_32 [abd] to fail or time out. This is not the case with other patches being tested recently, so it looks like there is a regression in your patches 14860 (test fail) and 14861 (test timeout). One failure in 14860 is: LustreError: 14561:0:(class_obd.c:684:cleanup_obdclass()) obd_memory max: 53634235, leaked: 152 shadow-10vm8: shadow-10vm8: Memory leaks detected Could you please investigate.

            Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/14862
            Subject: LU-5319 utils: update lr_reader to display additional data
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0738702663851f3d01f45b1725df2c623b507bb9

            gerrit Gerrit Updater added a comment - Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/14862 Subject: LU-5319 utils: update lr_reader to display additional data Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0738702663851f3d01f45b1725df2c623b507bb9

            Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/14861
            Subject: LU-5319 tests: testcases for multiple modify RPCs feature
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1fcb722e9b0b22a0cd5ae4640a1abf146efd1877

            gerrit Gerrit Updater added a comment - Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/14861 Subject: LU-5319 tests: testcases for multiple modify RPCs feature Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1fcb722e9b0b22a0cd5ae4640a1abf146efd1877

            Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/14860
            Subject: LU-5319 mdt: support multiple modify RCPs in parallel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1d9870b493954d53d8cbb1ff994a99817df0b2d7

            gerrit Gerrit Updater added a comment - Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/14860 Subject: LU-5319 mdt: support multiple modify RCPs in parallel Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1d9870b493954d53d8cbb1ff994a99817df0b2d7

            People

              bzzz Alex Zhuravlev
              pichong Gregoire Pichon
              Votes:
              0 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: