Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12748

parallel readahead needs to be optimized at high number of process

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • master
    • 3
    • 9223372036854775807

    Description

      parallel readahead is enabled by default in master, it contributes to sequential read performance a lot. 
      However, if the number of IO thread is increased (e.g. NP=NCPU), read performance drops and it's lower than without readahead. it needs to be tunning and optimization.
      Here is test configuration and resutls.

      Client
      2 x Platinum 8160 CPU @ 2.10GHz, 192GB memory, 2 x IB-EDR(multi-rail)
      CentOS7.6(3.10.0-957.27.2.el7.x86_64)
      OFED-4.5
      
      for i in 6 12 24 48; do                                                                                                                                       
              size=$((768/i))                                                                                                                                       
              /work/tools/mpi/gcc/openmpi/2.1.1/bin/mpirun --allow-run-as-root -np $i /work/tools/bin/ior -w -r -t 1m -b ${size}g -e -F -vv -o /scratch0/file  | tee
       ior-1n${i}p-${VER}.log                                                                                                                                       
      done
      

      Summaruy of Read Performance(MB/sec)

      branch thr=6 thr=12 thr=24 thr=48
      b2_12  9,965  14,551  17,177 18,152
      master 15,252  16,026  17,842 16,991
      master(pRA=off) 10,253  14,489  17,839 18,658

      pRA=off  - disabling parallel readahead (llite.*.read_ahead_async_file_threshold_mb=0)

      Attachments

        Issue Links

          Activity

            [LU-12748] parallel readahead needs to be optimized at high number of process
            lixi_wc Li Xi added a comment -

            That toally makes sense. Thanks for the explanation, Andreas!

            lixi_wc Li Xi added a comment - That toally makes sense. Thanks for the explanation, Andreas!

            Li Xi, note that the ability to change labels on the ticket is one of the reasons that we only mark tickets "Resolved" instead of "Closed". Otherwise, it is necessary to re-open and close the ticket to change it again.

            adilger Andreas Dilger added a comment - Li Xi, note that the ability to change labels on the ticket is one of the reasons that we only mark tickets "Resolved" instead of "Closed". Otherwise, it is necessary to re-open and close the ticket to change it again.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37927/
            Subject: LU-12748 readahead: limit async ra requests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1427a72002e6b57017f1c66eb95f9bebff9ac37f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37927/ Subject: LU-12748 readahead: limit async ra requests Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1427a72002e6b57017f1c66eb95f9bebff9ac37f

            I was just discussing with Wang about this issue with the LU-13258 work. Thanks for figuring out the cross over. I will update my patch this new limit.

            simmonsja James A Simmons added a comment - I was just discussing with Wang about this issue with the LU-13258 work. Thanks for figuring out the cross over. I will update my patch this new limit.

            Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37927
            Subject: LU-12748 readahead: limit async ra requests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 447b93d563e555f3255995234b35c4546960768e

            gerrit Gerrit Updater added a comment - Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37927 Subject: LU-12748 readahead: limit async ra requests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 447b93d563e555f3255995234b35c4546960768e

            It looks like the crossover is at about NCPU/2 where the performance of parallel readahead and in-process readahead is the same. If we stop using async readahead at that point it should give the best of both worlds.

            adilger Andreas Dilger added a comment - It looks like the crossover is at about NCPU/2 where the performance of parallel readahead and in-process readahead is the same. If we stop using async readahead at that point it should give the best of both worlds.

            The problem could be that we try to submit too much async ra workers, even we limit number of active workers for workqueue did not help.

            I think to fix the problem we could introduce similar idea like what we did for limit RA memory.
            introduce another atomic counter to record flighting active async ra.

            And we limit flighting active async to number of active cpu cores etc which will give us a balance
            from single thread improvements and reduce contention of multiple threads.

            wshilong Wang Shilong (Inactive) added a comment - The problem could be that we try to submit too much async ra workers, even we limit number of active workers for workqueue did not help. I think to fix the problem we could introduce similar idea like what we did for limit RA memory. introduce another atomic counter to record flighting active async ra. And we limit flighting active async to number of active cpu cores etc which will give us a balance from single thread improvements and reduce contention of multiple threads.

            People

              wshilong Wang Shilong (Inactive)
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: