Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12505

mounting bigalloc enabled large OST takes a long time

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • master
    • 3
    • 9223372036854775807

    Description

      Not only Lustre OST, but also when OSS mounts large OST device which 'bigalloc' is enabled, it takes huge amount of time to complete.

      # time mount -t ldiskfs /dev/ddn/scratch0_ost0000 /lustre/scratch0/ost0000
      
      real    12m32.153s
      user    0m0.000s
      sys     11m49.887s
      
      # dumpe2fs -h /dev/ddn/scratch0_ost0000
      dumpe2fs 1.45.2.wc1 (27-May-2019)
      Filesystem volume name:   scratch0-OST0000
      Last mounted on:          /
      Filesystem UUID:          1ca9dd81-8b70-4805-a430-78b0eafc1c45
      Filesystem magic number:  0xEF53
      Filesystem revision #:    1 (dynamic)
      Filesystem features:      has_journal ext_attr dir_index filetype needs_recovery meta_bg extent 64bit mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota bigalloc
      Filesystem flags:         signed_directory_hash 
      Default mount options:    user_xattr acl
      Filesystem state:         clean
      Errors behavior:          Continue
      Filesystem OS type:       Linux
      Inode count:              1074397184
      Block count:              275045679104
      Reserved block count:     2750456791
      Free blocks:              274909403680
      Free inodes:              1074396851
      First block:              0
      Block size:               4096
      Cluster size:             131072
      Group descriptor size:    64
      Blocks per group:         1048576
      Clusters per group:       32768
      Inodes per group:         4096
      Inode blocks per group:   512
      RAID stride:              512
      RAID stripe width:        512
      Flex block group size:    256
      Filesystem created:       Mon Jul  1 00:43:14 2019
      Last mount time:          Wed Jul  3 05:55:22 2019
      Last write time:          Wed Jul  3 05:55:22 2019
      Mount count:              8
      Maximum mount count:      -1
      Last checked:             Mon Jul  1 00:43:14 2019
      Check interval:           0 (<none>)
      Lifetime writes:          2693 GB
      Reserved blocks uid:      0 (user root)
      Reserved blocks gid:      0 (group root)
      First inode:              11
      Inode size:               512
      Required extra isize:     32
      Desired extra isize:      32
      Journal inode:            8
      Default directory hash:   half_md4
      Directory Hash Seed:      4eeb2234-062d-4af5-8973-872baabd2e9f
      Journal backup:           inode blocks
      MMP block number:         131680
      MMP update interval:      5
      User quota inode:         3
      Group quota inode:        4
      Journal features:         journal_incompat_revoke journal_64bit
      Journal size:             4096M
      Journal length:           1048576
      Journal sequence:         0x00000494
      Journal start:            0
      MMP_block:
          mmp_magic: 0x4d4d50
          mmp_check_interval: 10
          mmp_sequence: 0x0000cd
          mmp_update_date: Wed Jul  3 06:00:33 2019
          mmp_update_time: 1562133633
          mmp_node_name: es18k-vm11
          mmp_device_name: sda
      

      Without bigalloc

      # time mount -t ldiskfs /dev/ddn/scratch0_ost0000 /lustre/scratch0/ost0000
      
      real	0m6.484s
      user	0m0.000s
      sys	0m4.954s
      

      Attachments

        Issue Links

          Activity

            [LU-12505] mounting bigalloc enabled large OST takes a long time

            Patch was landed upstream for 1.46 via commit 59037c5357d39c6d0f14a0aff70e67dc13eafc84

            adilger Andreas Dilger added a comment - Patch was landed upstream for 1.46 via commit 59037c5357d39c6d0f14a0aff70e67dc13eafc84

            To answer my own question, the bigalloc patches are on the master branch of the e2fsprogs repo, but not in the maint branch for 1.45.6.

            adilger Andreas Dilger added a comment - To answer my own question, the bigalloc patches are on the master branch of the e2fsprogs repo, but not in the maint branch for 1.45.6.

            Dongyang, have these patches been submitted upstream yet?

            adilger Andreas Dilger added a comment - Dongyang, have these patches been submitted upstream yet?

            Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35781
            Subject: LU-12505 mke2fs: set overhead in super block for bigalloc
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 8624a496ff7c3e4fd69fb7217ff56030111f4460

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35781 Subject: LU-12505 mke2fs: set overhead in super block for bigalloc Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 8624a496ff7c3e4fd69fb7217ff56030111f4460

            >it looks ldiskfs_get_group_desc() and ldiskfs_calculate_overhead() are taking most of CPU cycle a long while during mount.

            only once and store to super block for later use.

            >46.30% libext2fs.so.2.4 [.] rb_test_bmap
            >32.98% libext2fs.so.2.4 [.] ext2fs_test_generic_bmap

            it's know problem. bitmaps on e2fsprogs isn't good designed in case word have a several bits set, replace with IDR (from kernel) can improve speed dramatically.

            shadow Alexey Lyashkov added a comment - >it looks ldiskfs_get_group_desc() and ldiskfs_calculate_overhead() are taking most of CPU cycle a long while during mount. only once and store to super block for later use. >46.30% libext2fs.so.2.4 [.] rb_test_bmap >32.98% libext2fs.so.2.4 [.] ext2fs_test_generic_bmap it's know problem. bitmaps on e2fsprogs isn't good designed in case word have a several bits set, replace with IDR (from kernel) can improve speed dramatically.

            Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35659
            Subject: LU-12505 libext2fs: optimize ext2fs_convert_subcluster_bitmap()
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 47d5bc9d922585229dfd5da82a1f19ff93bea28e

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/35659 Subject: LU-12505 libext2fs: optimize ext2fs_convert_subcluster_bitmap() Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 47d5bc9d922585229dfd5da82a1f19ff93bea28e

            It wouldn't be a bad idea to post an email to linux-ext4 with this information. Maybe we can get some input on how to fix it, or Ted will "just know" the best way to fix the problem.

            adilger Andreas Dilger added a comment - It wouldn't be a bad idea to post an email to linux-ext4 with this information. Maybe we can get some input on how to fix it, or Ted will "just know" the best way to fix the problem.

            maybe, it would be better to test with newer kernel if same behavior reproduced?
            btw, mke2fs to bigalloc enabled OST, is also very slow either.

            without bigalloc

            # time mkfs.lustre --ost --servicenode=127.0.0.2@tcp --fsname=scratch0 --index=2 --mgsnode=127.0.0.2@tcp --mkfsoptions='-E lazy_itable_init=0,lazy_journal_init=0,stripe_width=512,stride=512 -O meta_bg,^resize_inode -m1 -J size=4096' --reformat --backfstype=ldiskfs /dev/ddn/scratch0_ost0tune2fs -E mmp_update_interval=5 /dev/ddn/scratch0_ost0002
            
            real    9m11.614s
            user    0m59.894s
            sys     7m10.594s
            

            with bigalloc

            # time mkfs.lustre --ost --servicenode=127.0.0.2@tcp --fsname=scratch0 --index=0 --mgsnode=127.0.0.2@tcp --mkfsoptions='-E lazy_itable_init=0,lazy_journal_init=0,stripe_width=512,stride=512 -O bigalloc -C 131072 -m1 -J size=4096' --reformat --backfstype=ldiskfs /dev/ddn/scratch0_ost0000
            
            real    43m5.349s
            user    24m29.652s
            sys     18m35.058s
            

            The most of CPU time are consumed at the following functions which I didn't see mke2fs without '-O bigalloc'.

            Samples: 24K of event 'cycles', Event count (approx.): 14154870804              
            Overhead  Shared Object      Symbol                                             
              46.30%  libext2fs.so.2.4   [.] rb_test_bmap                                   
              32.98%  libext2fs.so.2.4   [.] ext2fs_test_generic_bmap                       
              13.10%  libext2fs.so.2.4   [.] ext2fs_convert_subcluster_bitmap               
               6.96%  libext2fs.so.2.4   [.] ext2fs_test_generic_bmap@plt       
            
            sihara Shuichi Ihara added a comment - maybe, it would be better to test with newer kernel if same behavior reproduced? btw, mke2fs to bigalloc enabled OST, is also very slow either. without bigalloc # time mkfs.lustre --ost --servicenode=127.0.0.2@tcp --fsname=scratch0 --index=2 --mgsnode=127.0.0.2@tcp --mkfsoptions='-E lazy_itable_init=0,lazy_journal_init=0,stripe_width=512,stride=512 -O meta_bg,^resize_inode -m1 -J size=4096' --reformat --backfstype=ldiskfs /dev/ddn/scratch0_ost0tune2fs -E mmp_update_interval=5 /dev/ddn/scratch0_ost0002 real 9m11.614s user 0m59.894s sys 7m10.594s with bigalloc # time mkfs.lustre --ost --servicenode=127.0.0.2@tcp --fsname=scratch0 --index=0 --mgsnode=127.0.0.2@tcp --mkfsoptions='-E lazy_itable_init=0,lazy_journal_init=0,stripe_width=512,stride=512 -O bigalloc -C 131072 -m1 -J size=4096' --reformat --backfstype=ldiskfs /dev/ddn/scratch0_ost0000 real 43m5.349s user 24m29.652s sys 18m35.058s The most of CPU time are consumed at the following functions which I didn't see mke2fs without '-O bigalloc'. Samples: 24K of event 'cycles', Event count (approx.): 14154870804 Overhead Shared Object Symbol 46.30% libext2fs.so.2.4 [.] rb_test_bmap 32.98% libext2fs.so.2.4 [.] ext2fs_test_generic_bmap 13.10% libext2fs.so.2.4 [.] ext2fs_convert_subcluster_bitmap 6.96% libext2fs.so.2.4 [.] ext2fs_test_generic_bmap@plt

            People

              dongyang Dongyang Li
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: