Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17428

reduce default value for lru_max_age to 300s

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The default value for lru_max_age is currently 3900s (3900000 msec), or 1h5m.

      For most systems and use cases, this is far too long and can lead to clients holding on to locks that they don't need and may cause issues for other clients if those clients holding the LDLM locks have any network issues and the server lock callback RPCs are slow. It may also cause issues with the LRU, if clients hold on to locks only used once for a long time and potentially evict more important locks sooner, though more work is still needed to the DLM LRU algorithm in LU-11509.

      We regularly tune ldlm.namespaces.*.lru_max_age=300s on large clusters, and it makes sense to change this to be the default.

      Attachments

        Issue Links

          Activity

            [LU-17428] reduce default value for lru_max_age to 300s

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56377/
            Subject: LU-17428 tests: restore recovery-small/10a lru_max_age
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: c73f731f252b2628dc17de315f79bbf5d86965e0

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56377/ Subject: LU-17428 tests: restore recovery-small/10a lru_max_age Project: fs/lustre-release Branch: master Current Patch Set: Commit: c73f731f252b2628dc17de315f79bbf5d86965e0

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56377
            Subject: LU-17428 tests: restore recovery-small/10a lru_max_age
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e7749b1808864efe7e60c6491a9f58f69fa85c7b

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56377 Subject: LU-17428 tests: restore recovery-small/10a lru_max_age Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e7749b1808864efe7e60c6491a9f58f69fa85c7b
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53682/
            Subject: LU-17428 ldlm: reduce default lru_max_age
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 357cae970c5c45e8d58574db3c38b60e22565b6d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53682/ Subject: LU-17428 ldlm: reduce default lru_max_age Project: fs/lustre-release Branch: master Current Patch Set: Commit: 357cae970c5c45e8d58574db3c38b60e22565b6d

            Here is test results with/without patches for 51M files.

            Server: Rockylinux8.8
            32 x Client: Rockylinux8.9 (4.18.0-513.18.1.el8_9.x86_64) with Infiniband
            

            cached operations

            [root@src01-c0-n0 ~]# salloc -p src -N 32 -n 1024 --ntasks-per-node=32 /usr/mpi/gcc/openmpi-4.1.7a1/bin/mpirun --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /lustre/mdtest.out -F -v -w 32k -u
            

            master (commit: 32582842ca)

            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation             :      91130.204      91129.714      91129.933          0.134
               File stat                 :     354354.554     354352.575     354353.510          0.527
               File read                 :     595256.872     595253.386     595255.047          0.910
               File removal              :     350576.857     350574.930     350575.820          0.516
               Tree creation             :          9.731          9.731          9.731          0.000
               Tree removal              :          4.760          4.760          4.760          0.000
            V-1: Entering PrintTimestamp...
            

            master (commit: 32582842ca) + patch (https://review.whamcloud.com/#/c/fs/lustre-release/+/53682/)

            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation             :      93441.386      93440.882      93441.107          0.137
               File stat                 :     370829.765     370827.712     370828.641          0.550
               File read                 :     665331.466     665327.817     665329.454          0.970
               File removal              :     369667.588     369665.468     369666.457          0.557
               Tree creation             :          9.581          9.581          9.581          0.000
               Tree removal              :          4.083          4.083          4.083          0.000
            V-1: Entering PrintTimestamp...
            

            non-cache operation (-N 1)
            master (commit: 32582842ca)

            [root@src01-c0-n0 ~]# salloc -p src -N 32 -n 1024 --ntasks-per-node=32 /usr/mpi/gcc/openmpi-4.1.7a1/bin/mpirun --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /lustre/mdtest.out -F -v -w 32k -u -N 1
            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation             :      90541.429      90540.940      90541.157          0.133
               File stat                 :     398232.530     398230.361     398231.326          0.584
               File read                 :     643103.778     643100.309     643101.978          0.961
               File removal              :     352778.372     352776.450     352777.297          0.517
               Tree creation             :          8.149          8.149          8.149          0.000
               Tree removal              :          3.212          3.212          3.212          0.000
            V-1: Entering PrintTimestamp...
            

            master (commit: 32582842ca) + patch (https://review.whamcloud.com/#/c/fs/lustre-release/+/53682/)

            SUMMARY rate: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation             :      91919.556      91919.063      91919.280          0.134
               File stat                 :     406482.909     406480.761     406481.688          0.585
               File read                 :     571544.837     571541.607     571543.159          0.878
               File removal              :     308864.096     308862.455     308863.205          0.450
               Tree creation             :          8.051          8.051          8.051          0.000
               Tree removal              :          3.767          3.767          3.767          0.000
            V-1: Entering PrintTimestamp...
            

            I didn't see obvious performance regression after patches.

            sihara Shuichi Ihara added a comment - Here is test results with/without patches for 51M files. Server: Rockylinux8.8 32 x Client: Rockylinux8.9 (4.18.0-513.18.1.el8_9.x86_64) with Infiniband cached operations [root@src01-c0-n0 ~]# salloc -p src -N 32 -n 1024 --ntasks-per-node=32 /usr/mpi/gcc/openmpi-4.1.7a1/bin/mpirun --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /lustre/mdtest.out -F -v -w 32k -u master (commit: 32582842ca) SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 91130.204 91129.714 91129.933 0.134 File stat : 354354.554 354352.575 354353.510 0.527 File read : 595256.872 595253.386 595255.047 0.910 File removal : 350576.857 350574.930 350575.820 0.516 Tree creation : 9.731 9.731 9.731 0.000 Tree removal : 4.760 4.760 4.760 0.000 V-1: Entering PrintTimestamp... master (commit: 32582842ca) + patch ( https://review.whamcloud.com/#/c/fs/lustre-release/+/53682/ ) SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 93441.386 93440.882 93441.107 0.137 File stat : 370829.765 370827.712 370828.641 0.550 File read : 665331.466 665327.817 665329.454 0.970 File removal : 369667.588 369665.468 369666.457 0.557 Tree creation : 9.581 9.581 9.581 0.000 Tree removal : 4.083 4.083 4.083 0.000 V-1: Entering PrintTimestamp... non-cache operation (-N 1) master (commit: 32582842ca) [root@src01-c0-n0 ~]# salloc -p src -N 32 -n 1024 --ntasks-per-node=32 /usr/mpi/gcc/openmpi-4.1.7a1/bin/mpirun --allow-run-as-root /work/tools/bin/mdtest -n 50000 -d /lustre/mdtest.out -F -v -w 32k -u -N 1 SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 90541.429 90540.940 90541.157 0.133 File stat : 398232.530 398230.361 398231.326 0.584 File read : 643103.778 643100.309 643101.978 0.961 File removal : 352778.372 352776.450 352777.297 0.517 Tree creation : 8.149 8.149 8.149 0.000 Tree removal : 3.212 3.212 3.212 0.000 V-1: Entering PrintTimestamp... master (commit: 32582842ca) + patch ( https://review.whamcloud.com/#/c/fs/lustre-release/+/53682/ ) SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 91919.556 91919.063 91919.280 0.134 File stat : 406482.909 406480.761 406481.688 0.585 File read : 571544.837 571541.607 571543.159 0.878 File removal : 308864.096 308862.455 308863.205 0.450 Tree creation : 8.051 8.051 8.051 0.000 Tree removal : 3.767 3.767 3.767 0.000 V-1: Entering PrintTimestamp... I didn't see obvious performance regression after patches.

            Then there are other sites that hop up and down how their terabytes of RAM on the clients go unused because we don't cache enough data.

            But that page cache removal is happening even when there is only a single file and the application is using it. I don't think that is caused by DLM lock cancellation.

            Yingjin is fixing the mlock handling in LU-17463 so that vmtouch is working to pin the files in memory.

            adilger Andreas Dilger added a comment - Then there are other sites that hop up and down how their terabytes of RAM on the clients go unused because we don't cache enough data. But that page cache removal is happening even when there is only a single file and the application is using it. I don't think that is caused by DLM lock cancellation. Yingjin is fixing the mlock handling in LU-17463 so that vmtouch is working to pin the files in memory.
            green Oleg Drokin added a comment -

            Then there are other sites that hop up and down how their terabytes of RAM on the clients go unused because we don't cache enough data.

            Are we having more of ldlm pools breakage so it does not cancel locks fast enough?

            In the ideal world we'd cache locks for a long time as long as RAM permits, I think.

            It was bad enough when we dropped the lru age from 10h to 1h, and now down to 5 minutes?

            Do we at least have some ticket to hopefully do a smarter thing and allow longer caching times?

            green Oleg Drokin added a comment - Then there are other sites that hop up and down how their terabytes of RAM on the clients go unused because we don't cache enough data. Are we having more of ldlm pools breakage so it does not cancel locks fast enough? In the ideal world we'd cache locks for a long time as long as RAM permits, I think. It was bad enough when we dropped the lru age from 10h to 1h, and now down to 5 minutes? Do we at least have some ticket to hopefully do a smarter thing and allow longer caching times?

            Oleg, there are quite a number of sites that have issues (linked here) where clients too many DLM locks lingering and causing too much contention when other clients eventually access those locks, rather than cancelling them more quickly. Many of the linked tickets use lru_max_age between 300-900s, and I've commonly been telling sites recently to use lru_max_age=300s and this patch is just to encode this as the default.

            So I think this is more of a stability change than a performance change.

            adilger Andreas Dilger added a comment - Oleg, there are quite a number of sites that have issues (linked here) where clients too many DLM locks lingering and causing too much contention when other clients eventually access those locks, rather than cancelling them more quickly. Many of the linked tickets use lru_max_age between 300-900s, and I've commonly been telling sites recently to use lru_max_age=300s and this patch is just to encode this as the default. So I think this is more of a stability change than a performance change.
            green Oleg Drokin added a comment -

            sihara I think this might have various negative impacts in workloads that highly depend on caching like various io500 stuff potentially? Can you please chime in here to avoid any unnecessary surprises later?

            green Oleg Drokin added a comment - sihara I think this might have various negative impacts in workloads that highly depend on caching like various io500 stuff potentially? Can you please chime in here to avoid any unnecessary surprises later?

            "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/53684/
            Subject: LU-17428 doc: remove default value for lru_age_max
            Project: doc/manual
            Branch: master
            Current Patch Set:
            Commit: 299591728836ae072cb537fae6b76a6ad1738208

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/53684/ Subject: LU-17428 doc: remove default value for lru_age_max Project: doc/manual Branch: master Current Patch Set: Commit: 299591728836ae072cb537fae6b76a6ad1738208

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: