Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9372

OOM happens on OSS during Lustre recovery for more than 5000 clients

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.6
    • None
    • Server running with b2_7_fe
      Clients are a mix of IEEL3 (RH7/SCS5), 2.5.3.90 (RH6/AE4), 2.7.3 (CentOS7)
    • 3
    • 9223372036854775807

    Description

      I have been on-site to work with Bruno Travouillon (Atos) on one of the crash-dumps they have.

      After joint analysis, it looks like a huge memory part is being consumed by "ptlrpc_request_buffer_desc" (17KB size each due to the embedded req, and that have been allocated in 32KB Slabs to increase/double side effect!).

      Having a look to the concerned source code, it looks like these "ptlrpc_request_buffer_desc" could be additionally allocated upon need by ptlrpc_check_rqbd_pool(), but will never be freed until OST umount/stop by ptlrpc_service_purge_all().

      This problem has caused several OSS failovers to fail due to OOM.

      Attachments

        Issue Links

          Activity

            [LU-9372] OOM happens on OSS during Lustre recovery for more than 5000 clients

            Both patches from LU-10803 and LU-10826 are also must-have/follow-ons to the LU-9372 serie.

            bfaccini Bruno Faccini (Inactive) added a comment - Both patches from LU-10803 and LU-10826 are also must-have/follow-ons to the LU-9372 serie.

            Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/31622
            Subject: LU-9372 ptlrpc: fix req_buffers_max and req_history_max setting
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: aa9005eb5c9e873e9e83619ff830ba848917f118

            gerrit Gerrit Updater added a comment - Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/31622 Subject: LU-9372 ptlrpc: fix req_buffers_max and req_history_max setting Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: aa9005eb5c9e873e9e83619ff830ba848917f118

            Master patch https://review.whamcloud.com/31162 from LU-10603 is required to make associated tunable visible to the external world and thus to allow this https://review.whamcloud.com/29064/ patch/feature to be usable.

            So just in case, Minh, any back-port of #29064 requires also to back-port #31162.

            bfaccini Bruno Faccini (Inactive) added a comment - Master patch https://review.whamcloud.com/31162 from LU-10603 is required to make associated tunable visible to the external world and thus to allow this https://review.whamcloud.com/29064/ patch/feature to be usable. So just in case, Minh, any back-port of #29064 requires also to back-port #31162.

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31108
            Subject: LU-9372 ptlrpc: allow to limit number of service's rqbds
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 69ad99bf62cf461df93419e57adb323a6d537e31

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31108 Subject: LU-9372 ptlrpc: allow to limit number of service's rqbds Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 69ad99bf62cf461df93419e57adb323a6d537e31
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29064/
            Subject: LU-9372 ptlrpc: allow to limit number of service's rqbds
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d9e57a765e73e1bc3046124433eb6e2186f7e07c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29064/ Subject: LU-9372 ptlrpc: allow to limit number of service's rqbds Project: fs/lustre-release Branch: master Current Patch Set: Commit: d9e57a765e73e1bc3046124433eb6e2186f7e07c

            We allocate 90 GB of RAM and 8 CPU cores to each OSS. We can't allocate more resources per virtual guest in a SFA14KXE. The cores are HT.
             

             oss# cat /proc/sys/lnet/cpu_partition_table
             0 : 0 1 2 3 
             1 : 4 5 6 7 
             2 : 8 9 10 11 
             3 : 12 13 14 15
            

            Last time we hit OOM, the memory consumption of the OSS was at its maximum (90GB).

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - We allocate 90 GB of RAM and 8 CPU cores to each OSS. We can't allocate more resources per virtual guest in a SFA14KXE. The cores are HT.   oss# cat /proc/sys/lnet/cpu_partition_table 0 : 0 1 2 3 1 : 4 5 6 7 2 : 8 9 10 11 3 : 12 13 14 15 Last time we hit OOM, the memory consumption of the OSS was at its maximum (90GB).

            I couldn’t find in the ticket how much RAM is on this OSS for the 20 OSTs? I’m wondering if we are also having problems here with CPT allocations all happening on one CPT and hitting OOM while there is plenty of RAM available on a second CPT?

            adilger Andreas Dilger added a comment - I couldn’t find in the ticket how much RAM is on this OSS for the 20 OSTs? I’m wondering if we are also having problems here with CPT allocations all happening on one CPT and hitting OOM while there is plenty of RAM available on a second CPT?

            I was just talking about this problem, and I have found that I had never clearly indicated in this ticket that the reason of this 32k alloc for each ptlrpc_rqbd (for a real size of 17k) is due to patch for LU-4755 ("LU-4755 ptlrpc: enlarge OST_MAXREQSIZE for 4MB RPC").

            Since the way this size has been identified looks a bit empiric, we may also want to give try to 15k (+ payload size, thus leading to 16k) in order to divide the real consumed size by 2.
            To allow almost the same size reduction, we may also try to use a specific kmem_cache/Slabs for ptlrpc_rqbd/17k, keeping in mind that it may be made useless due to Kernel merging of Slabs.

            bfaccini Bruno Faccini (Inactive) added a comment - I was just talking about this problem, and I have found that I had never clearly indicated in this ticket that the reason of this 32k alloc for each ptlrpc_rqbd (for a real size of 17k) is due to patch for LU-4755 (" LU-4755 ptlrpc: enlarge OST_MAXREQSIZE for 4MB RPC"). Since the way this size has been identified looks a bit empiric, we may also want to give try to 15k (+ payload size, thus leading to 16k) in order to divide the real consumed size by 2. To allow almost the same size reduction, we may also try to use a specific kmem_cache/Slabs for ptlrpc_rqbd/17k, keeping in mind that it may be made useless due to Kernel merging of Slabs.

            Bruno, my main concern is that a static tunable will not avoid a similar problem for most users.

            adilger Andreas Dilger added a comment - Bruno, my main concern is that a static tunable will not avoid a similar problem for most users.

            People

              bfaccini Bruno Faccini (Inactive)
              bfaccini Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: