Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14580

Lustre 2.12.6 performance regression

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.6
    • 16 Socket Superdome Flex w/8 EDR IB interfaces
    • 2
    • 9223372036854775807

    Description

      Between Lustre 2.12.5 and 2.12.6 releases, we have a major performance regression on large systems with multiple interfaces.

      On a 16 socket Superdome Flex with 8 EDR interfaces functioning as a client of a filesystem capable of > 100 GB/s, using Lustre 2.12.5, we are measuring up to 50 GB/s. With Lustre 2.12.6, the same test is down to 13 GB/s.

      Reverting commit ID f92c7a161242c478658af09159a127bc21cba611 restores performance.

      Attachments

        Issue Links

          Activity

            [LU-14580] Lustre 2.12.6 performance regression
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-21 [ JFC-21 ]

            I'm pretty confident this and LU-14055 are the same issue, so I'm snagging both of these.

            I'm going to leave this one open for now, but I think LU-14055 is the better place to continue the discussion.

            paf0186 Patrick Farrell added a comment - I'm pretty confident this and LU-14055 are the same issue, so I'm snagging both of these. I'm going to leave this one open for now, but I think LU-14055 is the better place to continue the discussion.
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-14055 [ LU-14055 ]
            rdruon Raphael Druon made changes -
            Link New: This issue is related to EX-3318 [ EX-3318 ]
            pjones Peter Jones added a comment -

            Did you get any updated results on this Steve?

            pjones Peter Jones added a comment - Did you get any updated results on this Steve?

            @John Hammond Good idea! I tried re-introducing the padding to no effect on x86_64. 128-bit cache line architectures may have different results.

            schamp Stephen Champion added a comment - @John Hammond Good idea! I tried re-introducing the padding to no effect on x86_64. 128-bit cache line architectures may have different results.
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-21 [ JFC-21 ]

            I don't see problems with patch itself. Increment in osc_consume_write_grant() was removed because it is done by atomic_long_add_return() now outside that call and it is done in both places where it is called. But maybe the patch "LU-12687 osc: consume grants for direct I/O" itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?

            tappro Mikhail Pershin added a comment - I don't see problems with patch itself. Increment in osc_consume_write_grant() was removed because it is done by atomic_long_add_return() now outside that call and it is done in both places where it is called. But maybe the patch " LU-12687 osc: consume grants for direct I/O" itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?

            Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred f92c7a161242c478658af09159a127bc21cba611

            Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken. 

            tappro Mikhail Pershin added a comment - Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred f92c7a161242c478658af09159a127bc21cba611 Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken. 
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-1997 [ DDN-1997 ]

            People

              neilb Neil Brown
              schamp Stephen Champion
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: