Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.12.0
    • None
    • CentOS 7.6 - 3.10.0-957.1.3.el7_lustre.x86_64
    • 3
    • 9223372036854775807

    Description

      We just got some kind of deadlock on fir-md1-s2 that serves MDT0001 and MDT0003.

      I took a crash dump because the MDS was not usable and filesystem was hanging
      Attaching foreach bt in bt.all and vmcore-dmesg.txt

      Also, this is from the crash:

      crash> foreach bt >bt.all
      crash> kmem -i
                       PAGES        TOTAL      PERCENTAGE
          TOTAL MEM  65891316     251.4 GB         ----
               FREE  13879541      52.9 GB   21% of TOTAL MEM
               USED  52011775     198.4 GB   78% of TOTAL MEM
             SHARED  32692923     124.7 GB   49% of TOTAL MEM
            BUFFERS  32164243     122.7 GB   48% of TOTAL MEM
             CACHED   781150         3 GB    1% of TOTAL MEM
               SLAB  13801776      52.6 GB   20% of TOTAL MEM
      
         TOTAL HUGE        0            0         ----
          HUGE FREE        0            0    0% of TOTAL HUGE
      
         TOTAL SWAP  1048575         4 GB         ----
          SWAP USED        0            0    0% of TOTAL SWAP
          SWAP FREE  1048575         4 GB  100% of TOTAL SWAP
      
       COMMIT LIMIT  33994233     129.7 GB         ----
          COMMITTED   228689     893.3 MB    0% of TOTAL LIMIT
      crash> ps | grep ">"
      >     0      0   0  ffffffffb4218480  RU   0.0       0      0  [swapper/0]
      >     0      0   1  ffff8ec129410000  RU   0.0       0      0  [swapper/1]
      >     0      0   2  ffff8ed129f10000  RU   0.0       0      0  [swapper/2]
      >     0      0   3  ffff8ee129ebe180  RU   0.0       0      0  [swapper/3]
      >     0      0   4  ffff8eb1a9ba6180  RU   0.0       0      0  [swapper/4]
      >     0      0   5  ffff8ec129416180  RU   0.0       0      0  [swapper/5]
      >     0      0   6  ffff8ed129f16180  RU   0.0       0      0  [swapper/6]
      >     0      0   7  ffff8ee129eba080  RU   0.0       0      0  [swapper/7]
      >     0      0   8  ffff8eb1a9ba30c0  RU   0.0       0      0  [swapper/8]
      >     0      0   9  ffff8ec129411040  RU   0.0       0      0  [swapper/9]
      >     0      0  10  ffff8ed129f11040  RU   0.0       0      0  [swapper/10]
      >     0      0  11  ffff8ee129ebd140  RU   0.0       0      0  [swapper/11]
      >     0      0  12  ffff8eb1a9ba5140  RU   0.0       0      0  [swapper/12]
      >     0      0  13  ffff8ec129415140  RU   0.0       0      0  [swapper/13]
      >     0      0  14  ffff8ed129f15140  RU   0.0       0      0  [swapper/14]
      >     0      0  15  ffff8ee129ebb0c0  RU   0.0       0      0  [swapper/15]
      >     0      0  16  ffff8eb1a9ba4100  RU   0.0       0      0  [swapper/16]
      >     0      0  17  ffff8ec129412080  RU   0.0       0      0  [swapper/17]
      >     0      0  19  ffff8ee129ebc100  RU   0.0       0      0  [swapper/19]
      >     0      0  20  ffff8eb1a9408000  RU   0.0       0      0  [swapper/20]
      >     0      0  21  ffff8ec129414100  RU   0.0       0      0  [swapper/21]
      >     0      0  22  ffff8ed129f14100  RU   0.0       0      0  [swapper/22]
      >     0      0  23  ffff8ee129f38000  RU   0.0       0      0  [swapper/23]
      >     0      0  24  ffff8eb1a940e180  RU   0.0       0      0  [swapper/24]
      >     0      0  25  ffff8ec1294130c0  RU   0.0       0      0  [swapper/25]
      >     0      0  26  ffff8ed129f130c0  RU   0.0       0      0  [swapper/26]
      >     0      0  27  ffff8ee129f3e180  RU   0.0       0      0  [swapper/27]
      >     0      0  28  ffff8eb1a9409040  RU   0.0       0      0  [swapper/28]
      >     0      0  29  ffff8ec129430000  RU   0.0       0      0  [swapper/29]
      >     0      0  30  ffff8ed129f50000  RU   0.0       0      0  [swapper/30]
      >     0      0  31  ffff8ee129f39040  RU   0.0       0      0  [swapper/31]
      >     0      0  32  ffff8eb1a940d140  RU   0.0       0      0  [swapper/32]
      >     0      0  33  ffff8ec129436180  RU   0.0       0      0  [swapper/33]
      >     0      0  34  ffff8ed129f56180  RU   0.0       0      0  [swapper/34]
      >     0      0  35  ffff8ee129f3d140  RU   0.0       0      0  [swapper/35]
      >     0      0  36  ffff8eb1a940a080  RU   0.0       0      0  [swapper/36]
      >     0      0  37  ffff8ec129431040  RU   0.0       0      0  [swapper/37]
      >     0      0  38  ffff8ed129f51040  RU   0.0       0      0  [swapper/38]
      >     0      0  39  ffff8ee129f3a080  RU   0.0       0      0  [swapper/39]
      >     0      0  40  ffff8eb1a940c100  RU   0.0       0      0  [swapper/40]
      >     0      0  41  ffff8ec129435140  RU   0.0       0      0  [swapper/41]
      >     0      0  42  ffff8ed129f55140  RU   0.0       0      0  [swapper/42]
      >     0      0  43  ffff8ee129f3c100  RU   0.0       0      0  [swapper/43]
      >     0      0  44  ffff8eb1a940b0c0  RU   0.0       0      0  [swapper/44]
      >     0      0  45  ffff8ec129432080  RU   0.0       0      0  [swapper/45]
      >     0      0  46  ffff8ed129f52080  RU   0.0       0      0  [swapper/46]
      >     0      0  47  ffff8ee129f3b0c0  RU   0.0       0      0  [swapper/47]
      > 109549  109543  18  ffff8ee05406c100  RU   0.0  115440   2132  bash
       

      I noticed a lot of threads blocked on quota commands.

      Attachments

        1. bt.all
          1.68 MB
        2. fir-md1-s2-20190424-ldiskfs-event.log
          118 kB
        3. vmcore-dmesg.txt
          1.01 MB

        Activity

          [LU-12178] MDS deadlock with 2.12.0 (quotas?)

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34926/
          Subject: LU-12178 osd: do not rebalance quota under memory pressure
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: 8c0b1c9af812140bde14180a318ace834d077d4b

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34926/ Subject: LU-12178 osd: do not rebalance quota under memory pressure Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 8c0b1c9af812140bde14180a318ace834d077d4b

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34926
          Subject: LU-12178 osd: do not rebalance quota under memory pressure
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: 380447f63b8a9e6a232e1ea81b1e68c39bc28cf2

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34926 Subject: LU-12178 osd: do not rebalance quota under memory pressure Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 380447f63b8a9e6a232e1ea81b1e68c39bc28cf2
          pjones Peter Jones added a comment -

          Landed for 2.13

          pjones Peter Jones added a comment - Landed for 2.13

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34741/
          Subject: LU-12178 osd: do not rebalance quota under memory pressure
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c5e5b7cd872eb2fa0028cef8b1a5e5c51b085b44

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34741/ Subject: LU-12178 osd: do not rebalance quota under memory pressure Project: fs/lustre-release Branch: master Current Patch Set: Commit: c5e5b7cd872eb2fa0028cef8b1a5e5c51b085b44

          We do have 4x18TB MDTs for DoM so just in case you want to see formatting options, please see below (we do have the extent flag):

           

          [root@fir-md1-s1 ~]# dumpe2fs -h /dev/mapper/md1-rbod1-mdt0 
          dumpe2fs 1.44.3.wc1 (23-July-2018)
          Filesystem volume name:   fir-MDT0000
          Last mounted on:          /
          Filesystem UUID:          d929671c-a108-4120-86aa-783d4601057a
          Filesystem magic number:  0xEF53
          Filesystem revision #:    1 (dynamic)
          Filesystem features:      has_journal ext_attr dir_index filetype needs_recovery extent 64bit mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota
          Filesystem flags:         signed_directory_hash 
          Default mount options:    user_xattr acl
          Filesystem state:         clean
          Errors behavior:          Continue
          Filesystem OS type:       Linux
          Inode count:              288005760
          Block count:              4681213440
          Reserved block count:     234060672
          Free blocks:              4219981031
          Free inodes:              250082482
          First block:              0
          Block size:               4096
          Fragment size:            4096
          Group descriptor size:    64
          Blocks per group:         32768
          Fragments per group:      32768
          Inodes per group:         2016
          Inode blocks per group:   504
          Flex block group size:    16
          Filesystem created:       Thu Jan 24 14:00:46 2019
          Last mount time:          Fri Apr 26 06:56:02 2019
          Last write time:          Fri Apr 26 06:56:02 2019
          Mount count:              57
          Maximum mount count:      -1
          Last checked:             Thu Jan 24 14:00:46 2019
          Check interval:           0 (<none>)
          Lifetime writes:          23 TB
          Reserved blocks uid:      0 (user root)
          Reserved blocks gid:      0 (group root)
          First inode:              11
          Inode size:	          1024
          Required extra isize:     32
          Desired extra isize:      32
          Journal inode:            8
          Default directory hash:   half_md4
          Directory Hash Seed:      d9ae92da-e0cd-43f5-a26b-e6a4e9c64832
          Journal backup:           inode blocks
          MMP block number:         10335
          MMP update interval:      5
          User quota inode:         3
          Group quota inode:        4
          Journal features:         journal_incompat_revoke journal_64bit
          Journal size:             4096M
          Journal length:           1048576
          Journal sequence:         0x010f2bf0
          Journal start:            1
          MMP_block:
              mmp_magic: 0x4d4d50
              mmp_check_interval: 10
              mmp_sequence: 0x00030d
              mmp_update_date: Fri Apr 26 08:01:02 2019
              mmp_update_time: 1556290862
              mmp_node_name: fir-md1-s1
              mmp_device_name: dm-4
          
          sthiell Stephane Thiell added a comment - We do have 4x18TB MDTs for DoM so just in case you want to see formatting options, please see below (we do have the extent flag):   [root@fir-md1-s1 ~]# dumpe2fs -h /dev/mapper/md1-rbod1-mdt0 dumpe2fs 1.44.3.wc1 (23-July-2018) Filesystem volume name: fir-MDT0000 Last mounted on: / Filesystem UUID: d929671c-a108-4120-86aa-783d4601057a Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr dir_index filetype needs_recovery extent 64bit mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink quota Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 288005760 Block count: 4681213440 Reserved block count: 234060672 Free blocks: 4219981031 Free inodes: 250082482 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 2016 Inode blocks per group: 504 Flex block group size: 16 Filesystem created: Thu Jan 24 14:00:46 2019 Last mount time: Fri Apr 26 06:56:02 2019 Last write time: Fri Apr 26 06:56:02 2019 Mount count: 57 Maximum mount count: -1 Last checked: Thu Jan 24 14:00:46 2019 Check interval: 0 (<none>) Lifetime writes: 23 TB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 1024 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: d9ae92da-e0cd-43f5-a26b-e6a4e9c64832 Journal backup: inode blocks MMP block number: 10335 MMP update interval: 5 User quota inode: 3 Group quota inode: 4 Journal features: journal_incompat_revoke journal_64bit Journal size: 4096M Journal length: 1048576 Journal sequence: 0x010f2bf0 Journal start: 1 MMP_block: mmp_magic: 0x4d4d50 mmp_check_interval: 10 mmp_sequence: 0x00030d mmp_update_date: Fri Apr 26 08:01:02 2019 mmp_update_time: 1556290862 mmp_node_name: fir-md1-s1 mmp_device_name: dm-4

          We haven't applied the patch yet and the problem has not happened again, but while checking the server logs of I noticed a ldiskfs-related event which looks like ldiskfs blocked in list_sort. The server did recover and we had no report of a slowdown but... just in case, I attached fir-md1-s2-20190424-ldiskfs-event.log

          sthiell Stephane Thiell added a comment - We haven't applied the patch yet and the problem has not happened again, but while checking the server logs of I noticed a ldiskfs-related event which looks like ldiskfs blocked in list_sort. The server did recover and we had no report of a slowdown but... just in case, I attached  fir-md1-s2-20190424-ldiskfs-event.log

          Thanks bzzz, that sounds great! We'll likely wait until the patch has landed into master unless the issue happens again before that.

          sthiell Stephane Thiell added a comment - Thanks  bzzz , that sounds great! We'll likely wait until the patch has landed into master unless the issue happens again before that.

          People

            bzzz Alex Zhuravlev
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: