Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14580

Lustre 2.12.6 performance regression

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.6
    • 16 Socket Superdome Flex w/8 EDR IB interfaces
    • 2
    • 9223372036854775807

    Description

      Between Lustre 2.12.5 and 2.12.6 releases, we have a major performance regression on large systems with multiple interfaces.

      On a 16 socket Superdome Flex with 8 EDR interfaces functioning as a client of a filesystem capable of > 100 GB/s, using Lustre 2.12.5, we are measuring up to 50 GB/s. With Lustre 2.12.6, the same test is down to 13 GB/s.

      Reverting commit ID f92c7a161242c478658af09159a127bc21cba611 restores performance.

      Attachments

        Issue Links

          Activity

            [LU-14580] Lustre 2.12.6 performance regression

            I'm pretty confident this and LU-14055 are the same issue, so I'm snagging both of these.

            I'm going to leave this one open for now, but I think LU-14055 is the better place to continue the discussion.

            paf0186 Patrick Farrell added a comment - I'm pretty confident this and LU-14055 are the same issue, so I'm snagging both of these. I'm going to leave this one open for now, but I think LU-14055 is the better place to continue the discussion.
            pjones Peter Jones added a comment -

            Did you get any updated results on this Steve?

            pjones Peter Jones added a comment - Did you get any updated results on this Steve?

            @John Hammond Good idea! I tried re-introducing the padding to no effect on x86_64. 128-bit cache line architectures may have different results.

            schamp Stephen Champion added a comment - @John Hammond Good idea! I tried re-introducing the padding to no effect on x86_64. 128-bit cache line architectures may have different results.

            I don't see problems with patch itself. Increment in osc_consume_write_grant() was removed because it is done by atomic_long_add_return() now outside that call and it is done in both places where it is called. But maybe the patch "LU-12687 osc: consume grants for direct I/O" itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?

            tappro Mikhail Pershin added a comment - I don't see problems with patch itself. Increment in osc_consume_write_grant() was removed because it is done by atomic_long_add_return() now outside that call and it is done in both places where it is called. But maybe the patch " LU-12687 osc: consume grants for direct I/O" itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?

            Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred f92c7a161242c478658af09159a127bc21cba611

            Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken. 

            tappro Mikhail Pershin added a comment - Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred f92c7a161242c478658af09159a127bc21cba611 Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken. 
            neilb Neil Brown added a comment - - edited

            This looks to be a badly ported patch.

            Before the patch, osc_consume_write_grant() increments obd_dirty_pages.  After the patch it doesn't.

            There are two callers of osc_consume_write_grant().

            One of them, in osc_enter_cache_try() now increments obd_dirty_pages before calling osc_consume_write_grant()

            The other, in osc_queue_sync_pages(), hasn't been updated.  Presumably it needs an increment.  Though there is already an increment after the call.

            So the patch makes a behaviour change which wasn't intended.  I don't know what the correct behaviour is.

            @mpershin you did the backport: can you check it?

             

            neilb Neil Brown added a comment - - edited This looks to be a badly ported patch. Before the patch, osc_consume_write_grant() increments obd_dirty_pages.  After the patch it doesn't. There are two callers of osc_consume_write_grant(). One of them, in osc_enter_cache_try() now increments obd_dirty_pages before calling osc_consume_write_grant() The other, in osc_queue_sync_pages(), hasn't been updated.  Presumably it needs an increment.  Though there is already an increment after the call. So the patch makes a behaviour change which wasn't intended.  I don't know what the correct behaviour is. @mpershin you did the backport: can you check it?  
            jhammond John Hammond added a comment -

            schamp instead of reverting the entire f92c7a1 change, would it be possible to just revert the following hunk and rerun your benchmark?

            diff --git a/lustre/include/obd.h b/lustre/include/obd.h
            index 2b5fb97..8545fd4 100644
            --- a/lustre/include/obd.h
            +++ b/lustre/include/obd.h
            @@ -204,7 +204,6 @@ struct client_obd {
                    /* the grant values are protected by loi_list_lock below */
                    unsigned long            cl_dirty_pages;      /* all _dirty_ in pages */
                    unsigned long            cl_dirty_max_pages;  /* allowed w/o rpc */
            -       unsigned long            cl_dirty_transit;    /* dirty synchronous */
                    unsigned long            cl_avail_grant;   /* bytes of credit for ost */
                    unsigned long            cl_lost_grant;    /* lost credits (trunc) */
                    /* grant consumed for dirty pages */
            
            jhammond John Hammond added a comment - schamp instead of reverting the entire f92c7a1 change, would it be possible to just revert the following hunk and rerun your benchmark? diff --git a/lustre/include/obd.h b/lustre/include/obd.h index 2b5fb97..8545fd4 100644 --- a/lustre/include/obd.h +++ b/lustre/include/obd.h @@ -204,7 +204,6 @@ struct client_obd { /* the grant values are protected by loi_list_lock below */ unsigned long cl_dirty_pages; /* all _dirty_ in pages */ unsigned long cl_dirty_max_pages; /* allowed w/o rpc */ - unsigned long cl_dirty_transit; /* dirty synchronous */ unsigned long cl_avail_grant; /* bytes of credit for ost */ unsigned long cl_lost_grant; /* lost credits (trunc) */ /* grant consumed for dirty pages */
            pjones Peter Jones added a comment -

            Thanks for the heads-up Steve!

            pjones Peter Jones added a comment - Thanks for the heads-up Steve!

            FWIW, the test we are using is IOR w/ POSIX Direct IO:
            + mpirun sdf-1 240 /jet/home/champios/ior/bin/ior -i 3 -t 16m -b 13120m -s 1 -a POSIX --posix.odirect -e -F -w -k -E -D 30 -o /ocean/neocortex/tests/IOR-files/iorfile

            schamp Stephen Champion added a comment - FWIW, the test we are using is IOR w/ POSIX Direct IO: + mpirun sdf-1 240 /jet/home/champios/ior/bin/ior -i 3 -t 16m -b 13120m -s 1 -a POSIX --posix.odirect -e -F -w -k -E -D 30 -o /ocean/neocortex/tests/IOR-files/iorfile

            People

              neilb Neil Brown
              schamp Stephen Champion
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: