[LU-14580] Lustre 2.12.6 performance regression - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.12.6
Labels:
- performance
Environment:
16 Socket Superdome Flex w/8 EDR IB interfaces

Severity:
2
Rank (Obsolete):
9223372036854775807

Description

Between Lustre 2.12.5 and 2.12.6 releases, we have a major performance regression on large systems with multiple interfaces.

On a 16 socket Superdome Flex with 8 EDR interfaces functioning as a client of a filesystem capable of > 100 GB/s, using Lustre 2.12.5, we are measuring up to 50 GB/s. With Lustre 2.12.6, the same test is down to 13 GB/s.

Reverting commit ID f92c7a161242c478658af09159a127bc21cba611 restores performance.

Attachments

Issue Links

is related to

LU-14055 Write performance regression caused by an commit from LU-13344

Resolved

LU-12820 obd_dirty_transit_pages is always zero and can be removed

Resolved

Activity

[LU-14580] Lustre 2.12.6 performance regression

Patrick Farrell added a comment - 14/Jun/21 8:57 PM

I'm pretty confident this and ~~LU-14055~~ are the same issue, so I'm snagging both of these.

I'm going to leave this one open for now, but I think ~~LU-14055~~ is the better place to continue the discussion.

Patrick Farrell added a comment - 14/Jun/21 8:57 PM I'm pretty confident this and LU-14055 are the same issue, so I'm snagging both of these. I'm going to leave this one open for now, but I think LU-14055 is the better place to continue the discussion.

Peter Jones added a comment - 28/May/21 6:15 PM

Did you get any updated results on this Steve?

Peter Jones added a comment - 28/May/21 6:15 PM Did you get any updated results on this Steve?

Stephen Champion added a comment - 17/May/21 9:16 PM

@John Hammond Good idea! I tried re-introducing the padding to no effect on x86_64. 128-bit cache line architectures may have different results.

Stephen Champion added a comment - 17/May/21 9:16 PM @John Hammond Good idea! I tried re-introducing the padding to no effect on x86_64. 128-bit cache line architectures may have different results.

Mikhail Pershin added a comment - 08/Apr/21 12:24 PM

I don't see problems with patch itself. Increment in osc_consume_write_grant() was removed because it is done by atomic_long_add_return() now outside that call and it is done in both places where it is called. But maybe the patch "~~LU-12687~~ osc: consume grants for direct I/O" itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?

Mikhail Pershin added a comment - 08/Apr/21 12:24 PM I don't see problems with patch itself. Increment in osc_consume_write_grant() was removed because it is done by atomic_long_add_return() now outside that call and it is done in both places where it is called. But maybe the patch " LU-12687 osc: consume grants for direct I/O" itself causes slowdown? Now grants are taken for Direct IO as well, so maybe that is related to not enough grants problem or similar. Are there any complains about grants on client during IOR run?

Mikhail Pershin added a comment - 08/Apr/21 7:41 AM

Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred f92c7a161242c478658af09159a127bc21cba611

Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken.

Mikhail Pershin added a comment - 08/Apr/21 7:41 AM Sure, I am checking that . These changes are combination of two patches actually: first ID is Ia047affc33fb9277e6c28a8f6d7d088c385b51a8 and next one is already referred f92c7a161242c478658af09159a127bc21cba611 Their ports to 2.12 base were a bit different from master branch so I am checking if some functionality was broken.

Neil Brown added a comment - 03/Apr/21 1:27 AM - edited

This looks to be a badly ported patch.

Before the patch, osc_consume_write_grant() increments obd_dirty_pages. After the patch it doesn't.

There are two callers of osc_consume_write_grant().

One of them, in osc_enter_cache_try() now increments obd_dirty_pages before calling osc_consume_write_grant()

The other, in osc_queue_sync_pages(), hasn't been updated. Presumably it needs an increment. Though there is already an increment after the call.

So the patch makes a behaviour change which wasn't intended. I don't know what the correct behaviour is.

@mpershin you did the backport: can you check it?

Neil Brown added a comment - 03/Apr/21 1:27 AM - edited This looks to be a badly ported patch. Before the patch, osc_consume_write_grant() increments obd_dirty_pages. After the patch it doesn't. There are two callers of osc_consume_write_grant(). One of them, in osc_enter_cache_try() now increments obd_dirty_pages before calling osc_consume_write_grant() The other, in osc_queue_sync_pages(), hasn't been updated. Presumably it needs an increment. Though there is already an increment after the call. So the patch makes a behaviour change which wasn't intended. I don't know what the correct behaviour is. @mpershin you did the backport: can you check it?

John Hammond added a comment - 02/Apr/21 8:55 PM

schamp instead of reverting the entire f92c7a1 change, would it be possible to just revert the following hunk and rerun your benchmark?

diff --git a/lustre/include/obd.h b/lustre/include/obd.h
index 2b5fb97..8545fd4 100644
--- a/lustre/include/obd.h
+++ b/lustre/include/obd.h
@@ -204,7 +204,6 @@ struct client_obd {
        /* the grant values are protected by loi_list_lock below */
        unsigned long            cl_dirty_pages;      /* all _dirty_ in pages */
        unsigned long            cl_dirty_max_pages;  /* allowed w/o rpc */
-       unsigned long            cl_dirty_transit;    /* dirty synchronous */
        unsigned long            cl_avail_grant;   /* bytes of credit for ost */
        unsigned long            cl_lost_grant;    /* lost credits (trunc) */
        /* grant consumed for dirty pages */

John Hammond added a comment - 02/Apr/21 8:55 PM schamp instead of reverting the entire f92c7a1 change, would it be possible to just revert the following hunk and rerun your benchmark? diff --git a/lustre/include/obd.h b/lustre/include/obd.h index 2b5fb97..8545fd4 100644 --- a/lustre/include/obd.h +++ b/lustre/include/obd.h @@ -204,7 +204,6 @@ struct client_obd { /* the grant values are protected by loi_list_lock below */ unsigned long cl_dirty_pages; /* all _dirty_ in pages */ unsigned long cl_dirty_max_pages; /* allowed w/o rpc */ - unsigned long cl_dirty_transit; /* dirty synchronous */ unsigned long cl_avail_grant; /* bytes of credit for ost */ unsigned long cl_lost_grant; /* lost credits (trunc) */ /* grant consumed for dirty pages */

Peter Jones added a comment - 02/Apr/21 4:47 PM

Thanks for the heads-up Steve!

Peter Jones added a comment - 02/Apr/21 4:47 PM Thanks for the heads-up Steve!

Stephen Champion added a comment - 02/Apr/21 4:01 PM

FWIW, the test we are using is IOR w/ POSIX Direct IO:
+ mpirun sdf-1 240 /jet/home/champios/ior/bin/ior -i 3 -t 16m -b 13120m -s 1 -a POSIX --posix.odirect -e -F -w -k -E -D 30 -o /ocean/neocortex/tests/IOR-files/iorfile

Stephen Champion added a comment - 02/Apr/21 4:01 PM FWIW, the test we are using is IOR w/ POSIX Direct IO: + mpirun sdf-1 240 /jet/home/champios/ior/bin/ior -i 3 -t 16m -b 13120m -s 1 -a POSIX --posix.odirect -e -F -w -k -E -D 30 -o /ocean/neocortex/tests/IOR-files/iorfile

People

Assignee:: Neil Brown

Reporter:: Stephen Champion

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 02/Apr/21 3:40 PM

Updated:: 04/Jul/21 2:52 PM