Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5841

Lustre 2.4.2 MDS, hitting OOM errors

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.2
    • Linux meerkat-mds-10-1.local 2.6.32-358.23.2.el6_lustre.x86_64 #1 SMP Thu Dec 19 19:57:45 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 16369

    Description

      Lustre 2.4.2, MDS reports OOM:

      Please see attached logs.

      Attachments

        1. Screenshot-meerkat-1104.png
          Screenshot-meerkat-1104.png
          31 kB
        2. dmesg-meerkat-11-04
          497 kB
        3. dmesg.1101
          474 kB
        4. debug_kernel.1101.gz
          0.2 kB

        Issue Links

          Activity

            [LU-5841] Lustre 2.4.2 MDS, hitting OOM errors
            niu Niu Yawei (Inactive) added a comment - Dup of LU-5726 .

            Could you try if the fix of LU-5726 can resolve your problem as well? Thanks.

            niu Niu Yawei (Inactive) added a comment - Could you try if the fix of LU-5726 can resolve your problem as well? Thanks.
            rmohr Rick Mohr added a comment -

            Haisong,

            Disabling zone_reclaim_mode seemed to fix our original issue with sluggish MDS performance, although I really don't know if this is in any way directly related to LU-5726 or not.

            rmohr Rick Mohr added a comment - Haisong, Disabling zone_reclaim_mode seemed to fix our original issue with sluggish MDS performance, although I really don't know if this is in any way directly related to LU-5726 or not.

            Andreas,

            Indeed we have set vm.zone_reclaim_mode=0 in our MDS servers since last Wednesday. From observation
            using "collectl -sM", 2 noticeable changes:

            1) buffer memory doesn't grow like used to, and
            2) used memory balances between 2 CPU nodes, where before it was one 2 or 3 times hight than the other.

            Here is a sample output I got just now:

            [root@meerkat-mds-10-2 ~]# collectl -sM -i 10
            waiting for 10 second sample...

            1. MEMORY STATISTICS
            2. Node Total Used Free Slab Mapped Anon Locked Inact Hit%
              0 12279M 10422M 1856M 2458M 3140K 41836K 0 3528M 100.00
              1 12288M 9831M 2456M 3529M 2988K 33116K 0 2768M 100.00
              0 12279M 10422M 1856M 2458M 3140K 41840K 0 3528M 100.00
              1 12288M 9832M 2455M 3529M 2988K 33112K 0 2767M 100.00
              0 12279M 10422M 1856M 2457M 3048K 41836K 0 3528M 100.00
              1 12288M 9833M 2454M 3530M 2988K 33004K 0 2767M 100.00
              0 12279M 10423M 1855M 2458M 3140K 41844K 0 3528M 100.00
              1 12288M 9835M 2452M 3532M 2988K 33108K 0 2768M 100.00

            Haisong

            haisong Haisong Cai (Inactive) added a comment - Andreas, Indeed we have set vm.zone_reclaim_mode=0 in our MDS servers since last Wednesday. From observation using "collectl -sM", 2 noticeable changes: 1) buffer memory doesn't grow like used to, and 2) used memory balances between 2 CPU nodes, where before it was one 2 or 3 times hight than the other. Here is a sample output I got just now: [root@meerkat-mds-10-2 ~] # collectl -sM -i 10 waiting for 10 second sample... MEMORY STATISTICS Node Total Used Free Slab Mapped Anon Locked Inact Hit% 0 12279M 10422M 1856M 2458M 3140K 41836K 0 3528M 100.00 1 12288M 9831M 2456M 3529M 2988K 33116K 0 2768M 100.00 0 12279M 10422M 1856M 2458M 3140K 41840K 0 3528M 100.00 1 12288M 9832M 2455M 3529M 2988K 33112K 0 2767M 100.00 0 12279M 10422M 1856M 2457M 3048K 41836K 0 3528M 100.00 1 12288M 9833M 2454M 3530M 2988K 33004K 0 2767M 100.00 0 12279M 10423M 1855M 2458M 3140K 41844K 0 3528M 100.00 1 12288M 9835M 2452M 3532M 2988K 33108K 0 2768M 100.00 Haisong

            Haisong, to clarify, you are now running your MDS with vm.zone_reclaim_mode=0 and that has resolved, or at least reduced the memory problems?

            We should consider setting this tunable by default on MDS nodes via mount.lustre, as we do with other tunables. There is some concern that this would go against the tunings of the administrator, and I'm not sure how to best handle that...

            adilger Andreas Dilger added a comment - Haisong, to clarify, you are now running your MDS with vm.zone_reclaim_mode=0 and that has resolved, or at least reduced the memory problems? We should consider setting this tunable by default on MDS nodes via mount.lustre, as we do with other tunables. There is some concern that this would go against the tunings of the administrator, and I'm not sure how to best handle that...

            Hi Yawei,

            Typical symptoms of this problem, in our case at least, has been processes, whether LNET, MDT, or MGC hang. Not only processes hang, a lot of times MDS OS itself would hang for a few minutes at a time. What you are seeing I believe are the results of some hanging LNET or Lustre network processes, following by disconnections to OSS/OST and clients.

            We have implemented suggestion from Rick Mohr, by disabling vm.zone_reclaim_mode in MDS. So far MDS has been behaving. We will continue monitoring.

            thanks,
            Haisong

            haisong Haisong Cai (Inactive) added a comment - Hi Yawei, Typical symptoms of this problem, in our case at least, has been processes, whether LNET, MDT, or MGC hang. Not only processes hang, a lot of times MDS OS itself would hang for a few minutes at a time. What you are seeing I believe are the results of some hanging LNET or Lustre network processes, following by disconnections to OSS/OST and clients. We have implemented suggestion from Rick Mohr, by disabling vm.zone_reclaim_mode in MDS. So far MDS has been behaving. We will continue monitoring. thanks, Haisong

            Hi, Haisong

            The log & stack trace shows that the server ran into OOM situation at the end, and the initial cause is that unstable network. We can see lots of clients reconnect and bulk io timeout errors on MDT at the beginning, could you check your network if it's healthy?

            The last crash in lu_context_key_degister() is dup of LU-3806, I think.

            niu Niu Yawei (Inactive) added a comment - Hi, Haisong The log & stack trace shows that the server ran into OOM situation at the end, and the initial cause is that unstable network. We can see lots of clients reconnect and bulk io timeout errors on MDT at the beginning, could you check your network if it's healthy? The last crash in lu_context_key_degister() is dup of LU-3806 , I think.

            Rick,

            Thank you for the note. I saw your comments in LU-5726 today and have disabled vm.zone_reclaim_mode.
            In LU-5726, you commented on disabling vm.zone_reclaim_mode "... just took longer for the same underlying problem to become evident again". Had the problem reoccurring in your MDS?

            thanks,
            Haisong

            haisong Haisong Cai (Inactive) added a comment - Rick, Thank you for the note. I saw your comments in LU-5726 today and have disabled vm.zone_reclaim_mode. In LU-5726 , you commented on disabling vm.zone_reclaim_mode "... just took longer for the same underlying problem to become evident again". Had the problem reoccurring in your MDS? thanks, Haisong
            rmohr Rick Mohr added a comment -

            Do you have vm.zone_reclaim_mode=0 set on your MDS server? I ran into issues with sluggish MDS server performance earlier this year that were fixed by setting that parameter.

            rmohr Rick Mohr added a comment - Do you have vm.zone_reclaim_mode=0 set on your MDS server? I ran into issues with sluggish MDS server performance earlier this year that were fixed by setting that parameter.

            MDS came to a point where it became unresponsive, system load at 65, buffer memory at 20GB out of 24GB total and wouldn't release.
            I Attempted unmounting MDT and reboot MDS where the server kernel panic'ed

            Attach screen dump here.

            haisong Haisong Cai (Inactive) added a comment - MDS came to a point where it became unresponsive, system load at 65, buffer memory at 20GB out of 24GB total and wouldn't release. I Attempted unmounting MDT and reboot MDS where the server kernel panic'ed Attach screen dump here.

            People

              niu Niu Yawei (Inactive)
              haisong Haisong Cai (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: