Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12178

MDS deadlock with 2.12.0 (quotas?)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.12.0
    • None
    • CentOS 7.6 - 3.10.0-957.1.3.el7_lustre.x86_64
    • 3
    • 9223372036854775807

    Description

      We just got some kind of deadlock on fir-md1-s2 that serves MDT0001 and MDT0003.

      I took a crash dump because the MDS was not usable and filesystem was hanging
      Attaching foreach bt in bt.all and vmcore-dmesg.txt

      Also, this is from the crash:

      crash> foreach bt >bt.all
      crash> kmem -i
                       PAGES        TOTAL      PERCENTAGE
          TOTAL MEM  65891316     251.4 GB         ----
               FREE  13879541      52.9 GB   21% of TOTAL MEM
               USED  52011775     198.4 GB   78% of TOTAL MEM
             SHARED  32692923     124.7 GB   49% of TOTAL MEM
            BUFFERS  32164243     122.7 GB   48% of TOTAL MEM
             CACHED   781150         3 GB    1% of TOTAL MEM
               SLAB  13801776      52.6 GB   20% of TOTAL MEM
      
         TOTAL HUGE        0            0         ----
          HUGE FREE        0            0    0% of TOTAL HUGE
      
         TOTAL SWAP  1048575         4 GB         ----
          SWAP USED        0            0    0% of TOTAL SWAP
          SWAP FREE  1048575         4 GB  100% of TOTAL SWAP
      
       COMMIT LIMIT  33994233     129.7 GB         ----
          COMMITTED   228689     893.3 MB    0% of TOTAL LIMIT
      crash> ps | grep ">"
      >     0      0   0  ffffffffb4218480  RU   0.0       0      0  [swapper/0]
      >     0      0   1  ffff8ec129410000  RU   0.0       0      0  [swapper/1]
      >     0      0   2  ffff8ed129f10000  RU   0.0       0      0  [swapper/2]
      >     0      0   3  ffff8ee129ebe180  RU   0.0       0      0  [swapper/3]
      >     0      0   4  ffff8eb1a9ba6180  RU   0.0       0      0  [swapper/4]
      >     0      0   5  ffff8ec129416180  RU   0.0       0      0  [swapper/5]
      >     0      0   6  ffff8ed129f16180  RU   0.0       0      0  [swapper/6]
      >     0      0   7  ffff8ee129eba080  RU   0.0       0      0  [swapper/7]
      >     0      0   8  ffff8eb1a9ba30c0  RU   0.0       0      0  [swapper/8]
      >     0      0   9  ffff8ec129411040  RU   0.0       0      0  [swapper/9]
      >     0      0  10  ffff8ed129f11040  RU   0.0       0      0  [swapper/10]
      >     0      0  11  ffff8ee129ebd140  RU   0.0       0      0  [swapper/11]
      >     0      0  12  ffff8eb1a9ba5140  RU   0.0       0      0  [swapper/12]
      >     0      0  13  ffff8ec129415140  RU   0.0       0      0  [swapper/13]
      >     0      0  14  ffff8ed129f15140  RU   0.0       0      0  [swapper/14]
      >     0      0  15  ffff8ee129ebb0c0  RU   0.0       0      0  [swapper/15]
      >     0      0  16  ffff8eb1a9ba4100  RU   0.0       0      0  [swapper/16]
      >     0      0  17  ffff8ec129412080  RU   0.0       0      0  [swapper/17]
      >     0      0  19  ffff8ee129ebc100  RU   0.0       0      0  [swapper/19]
      >     0      0  20  ffff8eb1a9408000  RU   0.0       0      0  [swapper/20]
      >     0      0  21  ffff8ec129414100  RU   0.0       0      0  [swapper/21]
      >     0      0  22  ffff8ed129f14100  RU   0.0       0      0  [swapper/22]
      >     0      0  23  ffff8ee129f38000  RU   0.0       0      0  [swapper/23]
      >     0      0  24  ffff8eb1a940e180  RU   0.0       0      0  [swapper/24]
      >     0      0  25  ffff8ec1294130c0  RU   0.0       0      0  [swapper/25]
      >     0      0  26  ffff8ed129f130c0  RU   0.0       0      0  [swapper/26]
      >     0      0  27  ffff8ee129f3e180  RU   0.0       0      0  [swapper/27]
      >     0      0  28  ffff8eb1a9409040  RU   0.0       0      0  [swapper/28]
      >     0      0  29  ffff8ec129430000  RU   0.0       0      0  [swapper/29]
      >     0      0  30  ffff8ed129f50000  RU   0.0       0      0  [swapper/30]
      >     0      0  31  ffff8ee129f39040  RU   0.0       0      0  [swapper/31]
      >     0      0  32  ffff8eb1a940d140  RU   0.0       0      0  [swapper/32]
      >     0      0  33  ffff8ec129436180  RU   0.0       0      0  [swapper/33]
      >     0      0  34  ffff8ed129f56180  RU   0.0       0      0  [swapper/34]
      >     0      0  35  ffff8ee129f3d140  RU   0.0       0      0  [swapper/35]
      >     0      0  36  ffff8eb1a940a080  RU   0.0       0      0  [swapper/36]
      >     0      0  37  ffff8ec129431040  RU   0.0       0      0  [swapper/37]
      >     0      0  38  ffff8ed129f51040  RU   0.0       0      0  [swapper/38]
      >     0      0  39  ffff8ee129f3a080  RU   0.0       0      0  [swapper/39]
      >     0      0  40  ffff8eb1a940c100  RU   0.0       0      0  [swapper/40]
      >     0      0  41  ffff8ec129435140  RU   0.0       0      0  [swapper/41]
      >     0      0  42  ffff8ed129f55140  RU   0.0       0      0  [swapper/42]
      >     0      0  43  ffff8ee129f3c100  RU   0.0       0      0  [swapper/43]
      >     0      0  44  ffff8eb1a940b0c0  RU   0.0       0      0  [swapper/44]
      >     0      0  45  ffff8ec129432080  RU   0.0       0      0  [swapper/45]
      >     0      0  46  ffff8ed129f52080  RU   0.0       0      0  [swapper/46]
      >     0      0  47  ffff8ee129f3b0c0  RU   0.0       0      0  [swapper/47]
      > 109549  109543  18  ffff8ee05406c100  RU   0.0  115440   2132  bash
       

      I noticed a lot of threads blocked on quota commands.

      Attachments

        1. bt.all
          1.68 MB
        2. fir-md1-s2-20190424-ldiskfs-event.log
          118 kB
        3. vmcore-dmesg.txt
          1.01 MB

        Activity

          People

            bzzz Alex Zhuravlev
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: