Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12988

improve mount time on huge ldiskfs filesystem

Details

    • 9223372036854775807

    Description

      during Lustre server startup few small files need to be updated (e.g. config backup).
      at this point buddy/bitmap cache is empty but mballoc wants to find a big chunk of free space for group preallocation and reads bitmaps one by one.
      sometimes this can take a very long time.
      one possisble workaround is to disable preallocation during mount.
      a long term plan is to limit scanning and prefetch bitmaps, but this needs more efforts and will be tracked separately (LU-12970)

      Attachments

        Issue Links

          Activity

            [LU-12988] improve mount time on huge ldiskfs filesystem

            hmm, I did remounted few times.. newly-created filesystem shouldn't demontrate any degradation w/o the patches even as a perfect group would be found immediately?

            bzzz Alex Zhuravlev added a comment - hmm, I did remounted few times.. newly-created filesystem shouldn't demontrate any degradation w/o the patches even as a perfect group would be found immediately?

            bzzz I hope I am wrong...

            I also get such optimistic results with my "adjusting table" script, but then I had realized that only FIRST MOUNT is slow. I recreated the partition and when mount became slow again even with my optimizations. bzzz, could you check your performance improvement on the fresh-created filesystem?

             

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - bzzz I hope I am wrong... I also get such optimistic results with my "adjusting table" script, but then I had realized that only FIRST MOUNT is slow. I recreated the partition and when mount became slow again even with my optimizations. bzzz , could you check your performance improvement on the fresh-created filesystem?  

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37155/
            Subject: LU-12988 osd: do not use preallocation during mount
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 2331cd1fa178b348d8aa048abbb5160ac9353461

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37155/ Subject: LU-12988 osd: do not use preallocation during mount Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 2331cd1fa178b348d8aa048abbb5160ac9353461

            Do you also have Artem's mballoc patch applied?

            yes, AFAICS. and that didn't help, probably fillness wasn't high enough. I'll check tomorrow.

            bzzz Alex Zhuravlev added a comment - Do you also have Artem's mballoc patch applied? yes, AFAICS. and that didn't help, probably fillness wasn't high enough. I'll check tomorrow.

            Do you also have Artem's mballoc patch applied? If not, should that be reverted from master, or do you think it makes sense to keep, but with lower thresholds (eg. 15%, 10%, 5% or whatever)?

            adilger Andreas Dilger added a comment - Do you also have Artem's mballoc patch applied? If not, should that be reverted from master, or do you think it makes sense to keep, but with lower thresholds (eg. 15%, 10%, 5% or whatever)?

            both https://review.whamcloud.com/#/c/36891/ and https://review.whamcloud.com/#/c/36893/ were used and need refresh, working on this..

            bzzz Alex Zhuravlev added a comment - both https://review.whamcloud.com/#/c/36891/ and https://review.whamcloud.com/#/c/36893/ were used and need refresh, working on this..

            Alex, which patch(es) we're used for you test? Can you please update the commit message for you patch with these test results and we can get them landed.

            adilger Andreas Dilger added a comment - Alex, which patch(es) we're used for you test? Can you please update the commit message for you patch with these test results and we can get them landed.

            on 85% filled fs using "slow" devmapper with 5ms IO delay:

            /dev/mapper/dm-slow                                         124T  100T   18T  85% /mnt/ost2
            

            tried to mount with non-patched ldiskfs, but interrupted after about half an hour:

            # time mount -t lustre /dev/mapper/dm-slow /mnt/ost2
            ^C
            
            real	27m42.190s
            user	0m0.000s
            sys	0m0.002s
            
            procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
             r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
             1  1      0 13311820 398560 6025368    0    0   200     0   67  332  0  0 50 50  0
             0  1      0 13311200 398760 6025680    0    0   200     0   65  327  0  0 50 50  1
             0  1      0 13310456 398960 6026220    0    0   200     0   66  338  0  0 50 50  0
             0  1      0 13309836 399160 6026620    0    0   200     0  162  426  0  0 50 50  0
             0  1      0 13309092 399360 6027036    0    0   200     0  165  422  0  0 50 50  1
             0  1      0 13308472 399560 6027428    0    0   200     0  169  434  0  0 50 50  0
             0  1      0 13308100 399760 6027912    0    0   200     0  164  421  0  0 50 50  0
             0  1      0 13307356 399960 6028240    0    0   200     0  168  439  0  0 50 50  0
            

            according to vmstat mount was going to take ~11 hours.

            now with patched ldiskfs with cache wiped by echo 3 >/proc/sys/vm/drop_caches:

            # time mount -t lustre /dev/mapper/dm-slow /mnt/ost2
            
            real	0m20.364s
            user	0m0.192s
            sys	0m1.103s
            
            bzzz Alex Zhuravlev added a comment - on 85% filled fs using "slow" devmapper with 5ms IO delay: /dev/mapper/dm-slow 124T 100T 18T 85% /mnt/ost2 tried to mount with non-patched ldiskfs, but interrupted after about half an hour: # time mount -t lustre /dev/mapper/dm-slow /mnt/ost2 ^C real 27m42.190s user 0m0.000s sys 0m0.002s procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 0 13311820 398560 6025368 0 0 200 0 67 332 0 0 50 50 0 0 1 0 13311200 398760 6025680 0 0 200 0 65 327 0 0 50 50 1 0 1 0 13310456 398960 6026220 0 0 200 0 66 338 0 0 50 50 0 0 1 0 13309836 399160 6026620 0 0 200 0 162 426 0 0 50 50 0 0 1 0 13309092 399360 6027036 0 0 200 0 165 422 0 0 50 50 1 0 1 0 13308472 399560 6027428 0 0 200 0 169 434 0 0 50 50 0 0 1 0 13308100 399760 6027912 0 0 200 0 164 421 0 0 50 50 0 0 1 0 13307356 399960 6028240 0 0 200 0 168 439 0 0 50 50 0 according to vmstat mount was going to take ~11 hours. now with patched ldiskfs with cache wiped by echo 3 >/proc/sys/vm/drop_caches: # time mount -t lustre /dev/mapper/dm-slow /mnt/ost2 real 0m20.364s user 0m0.192s sys 0m1.103s
            echo "0 134217728000  delay /dev/loop0 0 5" | dmsetup create dm-slow
            

            so now:

            # dd if=/dev/loop0 of=/dev/null bs=4k count=12800
            12800+0 records in
            12800+0 records out
            52428800 bytes (52 MB, 50 MiB) copied, 0.0662961 s, 791 MB/s
            # dd if=/dev/mapper/dm-slow of=/dev/null bs=4k count=12800
            12800+0 records in
            12800+0 records out
            52428800 bytes (52 MB, 50 MiB) copied, 4.05419 s, 12.9 MB/s
            

            trying to start Lustre on this virtual device..

            bzzz Alex Zhuravlev added a comment - echo "0 134217728000 delay /dev/loop0 0 5" | dmsetup create dm-slow so now: # dd if =/dev/loop0 of=/dev/ null bs=4k count=12800 12800+0 records in 12800+0 records out 52428800 bytes (52 MB, 50 MiB) copied, 0.0662961 s, 791 MB/s # dd if =/dev/mapper/dm-slow of=/dev/ null bs=4k count=12800 12800+0 records in 12800+0 records out 52428800 bytes (52 MB, 50 MiB) copied, 4.05419 s, 12.9 MB/s trying to start Lustre on this virtual device..

            bzzz, do you have results of comparing mount time with https://review.whamcloud.com/#/c/36704/ and without it? Please, share if you have.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - bzzz , do you have results of comparing mount time with  https://review.whamcloud.com/#/c/36704/  and without it? Please, share if you have.

            Artem's patch wasn't used. I'm thinking of few policies.. will try to make an overview soon.

            bzzz Alex Zhuravlev added a comment - Artem's patch wasn't used. I'm thinking of few policies.. will try to make an overview soon.

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: