Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6607

MDS ( 2 node DNE) running out of memory and crash

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • None
    • Lustre 2.7.0
    • 4
    • 9223372036854775807

    Description

      2 node DNE MDS
      16 OSS
      2K clients

      A MDS node randomly running out of memory and hang.
      We watch MDS drain its memory in matter of few minutes. Many times right after recovery from previous hangs.

      Clients are generating a ton of Lustre errors with strings "ptlrpc_expire_one_request". The numbers are from several hundred thousands to several millions of such errors from each node. Here are number of error counts from some nodes:

      comet-12-31 662616
      comet-10-06 690764
      comet-12-24 720396
      comet-12-25 735659
      comet-12-14 778073
      comet-12-33 840302
      comet-10-10 928322
      comet-12-33 945614
      comet-12-25 992288
      comet-10-15 1131711
      comet-12-25 1147043
      comet-10-07 1160876
      comet-12-30 1180270
      comet-10-03 1387072
      comet-10-02 2515764
      comet-10-02 3371128

      I am attaching logs from both client and server on one such incidence.

      Attachments

        1. dmesg_mds.gz
          21 kB
        2. lustre-log.tgz
          9.35 MB
        3. messages-19-6.gz
          92 kB
        4. clients_log.gz
          622 kB
        5. dmesg.out
          396 kB
        6. slabinfo.txt
          27 kB

        Activity

          [LU-6607] MDS ( 2 node DNE) running out of memory and crash
          pjones Peter Jones added a comment -

          SDSC have moved onto more current releases so I do not think any further work is needed here

          pjones Peter Jones added a comment - SDSC have moved onto more current releases so I do not think any further work is needed here
          di.wang Di Wang added a comment -

          Hello, Haisong

          Yes, I do not know the exact reason why for this 8192_size slab caused so much memory here. No, I do not think this is related with any default setting. Did you do a lot cross-MDT operation here, like creating remote directory or striped directory? (unfortunately, there are not enough stack trace information here) Btw: this stack trace is collected when OOM happens ? or before? or about to happen? Right now, I would suggest

          1. Use 2.7.58 plus that patch (http://review.whamcloud.com/#/c/14926/) you need, maybe also include http://review.whamcloud.com/#/c/16161/.
          2. Please add "log_bf_len=10M" in your boot command, so we can see more of the stack trace when error happens.
          3. Please help me find an easy way to reproduce the problem. Thanks!

          Even though 2.7.58 might not help you on this issue, but it is way better than 2.7.51 on DNE.

          di.wang Di Wang added a comment - Hello, Haisong Yes, I do not know the exact reason why for this 8192_size slab caused so much memory here. No, I do not think this is related with any default setting. Did you do a lot cross-MDT operation here, like creating remote directory or striped directory? (unfortunately, there are not enough stack trace information here) Btw: this stack trace is collected when OOM happens ? or before? or about to happen? Right now, I would suggest 1. Use 2.7.58 plus that patch ( http://review.whamcloud.com/#/c/14926/ ) you need, maybe also include http://review.whamcloud.com/#/c/16161/ . 2. Please add "log_bf_len=10M" in your boot command, so we can see more of the stack trace when error happens. 3. Please help me find an easy way to reproduce the problem. Thanks! Even though 2.7.58 might not help you on this issue, but it is way better than 2.7.51 on DNE.

          Hi WangDi,

          You stated that 2.7.58 has a lot fixes. But it may still not fix our problem, correct?
          Can you elaborate on slab situation? You indicated 941G (or 94G) was too big, why is it? Is it because of default setting or some configuration mistake?

          thanks,
          Haisong

          haisong Haisong Cai (Inactive) added a comment - Hi WangDi, You stated that 2.7.58 has a lot fixes. But it may still not fix our problem, correct? Can you elaborate on slab situation? You indicated 941G (or 94G) was too big, why is it? Is it because of default setting or some configuration mistake? thanks, Haisong
          di.wang Di Wang added a comment -

          Ah, it is. you can use that build. Thanks

          di.wang Di Wang added a comment - Ah, it is. you can use that build. Thanks

          Hi Wang Di,

          I understand LU-6584 is a different problem, for OSS and not MDS memory issue.

          What I said earlier was, to work on LU-6584 problem, we have to apply a patch soon. This is because they are the same
          file-system. That patch is built with http://review.whamcloud.com/#/c/14926/

          Was that 2.7.58 equivalent?

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Hi Wang Di, I understand LU-6584 is a different problem, for OSS and not MDS memory issue. What I said earlier was, to work on LU-6584 problem, we have to apply a patch soon. This is because they are the same file-system. That patch is built with http://review.whamcloud.com/#/c/14926/ Was that 2.7.58 equivalent? Haisong
          di.wang Di Wang added a comment -

          Hmm, I think LU-6584 is different issue. This ticket is about MDS OOM during failover? Do you happen to know any easy way to reproduce this problem?
          Hmm, btw: is that possible you can add "log_buf_len=10M" in your boot command? since the dmesg you post here only have half stack trace. Thanks.

          di.wang Di Wang added a comment - Hmm, I think LU-6584 is different issue. This ticket is about MDS OOM during failover? Do you happen to know any easy way to reproduce this problem? Hmm, btw: is that possible you can add "log_buf_len=10M" in your boot command? since the dmesg you post here only have half stack trace. Thanks.

          LU-6584 is about OSS crashing problem. The OSS servers are part of these very same MDS servers. They are the one file-system.

          We are about to apply a new patch related to LU-6584. It is built from http://review.whamcloud.com/#/c/14926/

          Will it be satisfy your recommendation?

          Haisong

          haisong Haisong Cai (Inactive) added a comment - LU-6584 is about OSS crashing problem. The OSS servers are part of these very same MDS servers. They are the one file-system. We are about to apply a new patch related to LU-6584 . It is built from http://review.whamcloud.com/#/c/14926/ Will it be satisfy your recommendation? Haisong
          di.wang Di Wang added a comment -

          Is that possible you can upgrade MDS to 2.7.58 ? there are quite a few fix on these area since 2.7.51.

          Btw: we are currently testing ZFS on DNE at LU-7009, please follow there.

          di.wang Di Wang added a comment - Is that possible you can upgrade MDS to 2.7.58 ? there are quite a few fix on these area since 2.7.51. Btw: we are currently testing ZFS on DNE at LU-7009 , please follow there.

          On one of the 2 MDS servers:

          [root@panda-mds-19-6 panda-mds-19-6]# sysctl -a | grep slab
          kernel.spl.kmem.slab_kmem_alloc = 92736
          kernel.spl.kmem.slab_kmem_max = 92736
          kernel.spl.kmem.slab_kmem_total = 172032
          kernel.spl.kmem.slab_vmem_alloc = 407675904
          kernel.spl.kmem.slab_vmem_max = 490480640
          kernel.spl.kmem.slab_vmem_total = 485459072
          vm.min_slab_ratio = 5

          haisong Haisong Cai (Inactive) added a comment - On one of the 2 MDS servers: [root@panda-mds-19-6 panda-mds-19-6] # sysctl -a | grep slab kernel.spl.kmem.slab_kmem_alloc = 92736 kernel.spl.kmem.slab_kmem_max = 92736 kernel.spl.kmem.slab_kmem_total = 172032 kernel.spl.kmem.slab_vmem_alloc = 407675904 kernel.spl.kmem.slab_vmem_max = 490480640 kernel.spl.kmem.slab_vmem_total = 485459072 vm.min_slab_ratio = 5

          Hi WangDi,

          We are running CentOS 6.6 with Linux kernel 3.10.73 from elrepo.
          Lustre and ZFS are build as kdms modules.

          Filesystem has 16 OSS and each has 6 OSTs.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Hi WangDi, We are running CentOS 6.6 with Linux kernel 3.10.73 from elrepo. Lustre and ZFS are build as kdms modules. Filesystem has 16 OSS and each has 6 OSTs. Haisong

          People

            laisiyao Lai Siyao
            haisong Haisong Cai (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: