Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6179

Lock ahead - Request extent locks from userspace

Details

    • New Feature
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • None
    • 17290

    Description

      At the recent developers conference, Jinshan proposed a different method of approaching the performance problems described in LU-6148.

      Instead of introducing a new type of LDLM lock matching, we'd like to make it possible for user space to explicitly request LDLM locks asynchronously from the IO.

      I've implemented a prototype version of the feature and will be uploading it for comments. I'll explain the state of the current version in a comment momentarily.

      Attachments

        1. anl_mpich_build_guide.txt
          18 kB
        2. cug paper.pdf
          714 kB
        3. lockahead_ladvise_mpich_patch
          30 kB
        4. LockAheadResults.docx
          516 kB
        5. LockAhead-TestReport.txt
          1.36 MB
        6. LUSTRE-LockAhead-140417-1056-170.pdf
          64 kB
        7. mmoore cug slides.pdf
          1.15 MB
        8. sle11_build_tools.tar.gz
          2.48 MB

        Issue Links

          Activity

            [LU-6179] Lock ahead - Request extent locks from userspace

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38109
            Subject: LU-6179 llite: remove LOCKAHEAD_OLD compatibility
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f6bc909bfda5521454631a4985648e07c63137ee

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38109 Subject: LU-6179 llite: remove LOCKAHEAD_OLD compatibility Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f6bc909bfda5521454631a4985648e07c63137ee
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/13564/
            Subject: LU-6179 llite: Implement ladvise lockahead
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a8dcf372f430c308d3e96fb506563068d0a80c2d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/13564/ Subject: LU-6179 llite: Implement ladvise lockahead Project: fs/lustre-release Branch: master Current Patch Set: Commit: a8dcf372f430c308d3e96fb506563068d0a80c2d

            Oh. Hm. No - It's skipping some of the tests. Sorry about that, thanks for pointing it out. Some development stuff I was doing escaped in to what I pushed upstream... I'll fix that when I rebase to merge.

            paf Patrick Farrell (Inactive) added a comment - Oh. Hm. No - It's skipping some of the tests. Sorry about that, thanks for pointing it out. Some development stuff I was doing escaped in to what I pushed upstream... I'll fix that when I rebase to merge.

            Patrick, the new test 255c in sanity.sh reports the following:

            == sanity test 255c: suite of ladvise lockahead tests ================================================ 04:54:04 (1495688044) Starting test test10 at 1495688045
            Finishing test test10 at 1495688045
            Starting test test20 at 1495688045
            cannot give advice: Invalid argument (22)
            cannot give advice: Invalid argument (22)
            cannot give advice: Invalid argument (22)
            cannot give advice: Invalid argument (22)
            cannot give advice: Invalid argument (22)
            cannot give advice: Invalid argument (22)
            Finishing test test20 at 1495688045
            
            

            Is that expected or test isn't working properly?

            This is from the latest test results https://testing.hpdd.intel.com/sub_tests/38e1f10a-4129-11e7-91f4-5254006e85c2

             

            tappro Mikhail Pershin added a comment - Patrick, the new test 255c in sanity.sh reports the following: == sanity test 255c: suite of ladvise lockahead tests ================================================ 04:54:04 (1495688044) Starting test test10 at 1495688045 Finishing test test10 at 1495688045 Starting test test20 at 1495688045 cannot give advice: Invalid argument (22) cannot give advice: Invalid argument (22) cannot give advice: Invalid argument (22) cannot give advice: Invalid argument (22) cannot give advice: Invalid argument (22) cannot give advice: Invalid argument (22) Finishing test test20 at 1495688045 Is that expected or test isn't working properly? This is from the latest test results https://testing.hpdd.intel.com/sub_tests/38e1f10a-4129-11e7-91f4-5254006e85c2  

            Hi Patrick,

            In the very simple test you suggested, the lock count does increase from 0 to 4010, so the lock ahead works well.
            Yes, a faster OST should show more benefits.

            czx0003 Cong Xu (Inactive) added a comment - Hi Patrick, In the very simple test you suggested, the lock count does increase from 0 to 4010, so the lock ahead works well. Yes, a faster OST should show more benefits.

            Huh, OK! That's a clever way to show it.

            A faster OST will show much larger benefits, of course.

            paf Patrick Farrell (Inactive) added a comment - Huh, OK! That's a clever way to show it. A faster OST will show much larger benefits, of course.

            Hi Patrick,

            Thanks for the great suggestions! We conducted more tests recently and was able to demonstrate the power of Lock Ahead in test "2.3 Vary Lustre Stripe Size (Independent I/O)".

            In this test, the transfer size of each process is configured to be 1MB and the stripe size grows from 256KB to 16MB. When the stripe size equals to 16MB, 16 processes write a single stripe simultaneously, leading to lock contention issues. In this test, Lock Ahead code performs 21.5% better than original code.

            czx0003 Cong Xu (Inactive) added a comment - Hi Patrick, Thanks for the great suggestions! We conducted more tests recently and was able to demonstrate the power of Lock Ahead in test "2.3 Vary Lustre Stripe Size (Independent I/O)". In this test, the transfer size of each process is configured to be 1MB and the stripe size grows from 256KB to 16MB. When the stripe size equals to 16MB, 16 processes write a single stripe simultaneously, leading to lock contention issues. In this test, Lock Ahead code performs 21.5% better than original code.

            Slides and paper from Cray User Group 2017 attached. They contain real performance #s on real hardware, including from real applications. Just for reference in case anyone is curious.

            paf Patrick Farrell (Inactive) added a comment - Slides and paper from Cray User Group 2017 attached. They contain real performance #s on real hardware, including from real applications. Just for reference in case anyone is curious.

            Cong,

            Sorry to take a bit to get back to you.

            Given the #s in section 2.4, you're barely seeing the problem, and lockahead does have some overhead. I wouldn't necessarily expect it to help in that case. It would be much easier to see if you had faster OSTs - So, I'd like to request RAM backed OSTs.

            It's also possible something is wrong with the library. While I think we'll need RAM backed OSTs (or at least, much faster OSTs) to see benefit, we can explore this possibility as well.

            Let's take one of the very simple tests, like a 1 stripe file with 1 process per client on 2 clients. I assume you're creating the file fresh before the test, but if not, please remove it and re-create it right before the test. Then, let's look at lock count before and after running IOR (add the -k option so the file isn't deleted, otherwise the locks will be cleaned up).

            Specifically, on one of the clients, cat the lock count for the OST where the file is:
            cat /sys/fs/lustre/ldlm/namespaces/[OST]/lock_count

            Before and after the test.

            If the file is not deleted and the lock count hasn't gone up, lock ahead didn't work for some reason.

            Again, I think we'll need RAM backed OSTs regardless... But this would be useful even without that.

            paf Patrick Farrell (Inactive) added a comment - Cong, Sorry to take a bit to get back to you. Given the #s in section 2.4, you're barely seeing the problem, and lockahead does have some overhead. I wouldn't necessarily expect it to help in that case. It would be much easier to see if you had faster OSTs - So, I'd like to request RAM backed OSTs. It's also possible something is wrong with the library. While I think we'll need RAM backed OSTs (or at least, much faster OSTs) to see benefit, we can explore this possibility as well. Let's take one of the very simple tests, like a 1 stripe file with 1 process per client on 2 clients. I assume you're creating the file fresh before the test, but if not, please remove it and re-create it right before the test. Then, let's look at lock count before and after running IOR (add the -k option so the file isn't deleted, otherwise the locks will be cleaned up). Specifically, on one of the clients, cat the lock count for the OST where the file is: cat /sys/fs/lustre/ldlm/namespaces/ [OST] /lock_count Before and after the test. If the file is not deleted and the lock count hasn't gone up, lock ahead didn't work for some reason. Again, I think we'll need RAM backed OSTs regardless... But this would be useful even without that.

            Hi Jinshan,

            Thanks for the suggestions! Yes, in our second test (Section 2.2 Vary number of processes (Independent I/O)), we started from 1 process per client with 8 clients (total 8 processes) to 64 processes per client with 8 clients (total 512 processes).

            Hi Patrick,

            Yes, we have tried simple test as you suggested. Please have a look at the results in section 2.4: Simple Test (1 process and 2 processes accessing a single shared file on one OST).

            czx0003 Cong Xu (Inactive) added a comment - Hi Jinshan, Thanks for the suggestions! Yes, in our second test (Section 2.2 Vary number of processes (Independent I/O)), we started from 1 process per client with 8 clients (total 8 processes) to 64 processes per client with 8 clients (total 512 processes). Hi Patrick, Yes, we have tried simple test as you suggested. Please have a look at the results in section 2.4: Simple Test (1 process and 2 processes accessing a single shared file on one OST).

            People

              paf Patrick Farrell (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: