[LU-744] Single client's performance degradation on 2.1 - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0
Labels:
None

Severity:
3
Rank (Obsolete):
4018

Description

During the performance testing on lustre-2.1, I saw the single client's performance degradation on it.
Here is IOR results on the single cleints with 2.1 and also lustre-1.8.6.80 for comparing.
I ran IOR (IOR -t 1m -b 32g -w -r -vv -F -o /lustre/ior.out/file) on the single client with 1, 2, 4 and 8 processes.

Write(MiB/sec)
v1.8.6.80 v2.1
446.25 411.43
808.53 761.30
1484.18 1151.41
1967.42 1172.06

Read(MiB/sec)
v1.8.6.80 v2.1
823.90 595.71
1449.49 1071.76
2502.49 1517.79
3133.43 1746.30

Tested on same infrastracture(hardware and network). The client just turned off the checksum on both testing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

2.4 Single Client 3May2013.xlsx
34 kB
06/May/13 2:44 PM
574.1.pdf
169 kB
07/Sep/12 4:18 PM
ior-256gb.tar.gz
32 kB
26/Aug/12 1:49 PM
ior-32gb.tar.gz
24 kB
26/Aug/12 1:49 PM
lu744-20120909.tar.gz
883 kB
08/Sep/12 2:27 PM
lu744-20120915.tar.gz
874 kB
15/Sep/12 1:03 AM
lu744-20120915-02.tar.gz
1.02 MB
15/Sep/12 4:23 AM
lu744-20121111.tar.gz
849 kB
10/Nov/12 12:22 PM
lu744-20121113.tar.gz
846 kB
16/Nov/12 7:58 AM
lu744-20121117.tar.gz
2.45 MB
17/Nov/12 12:03 PM
lu744-20130104.tar.gz
915 kB
03/Jan/13 11:18 AM
lu744-20130104-02.tar.gz
26 kB
03/Jan/13 11:30 PM
lu744-dls-20121113.tar.gz
10 kB
13/Nov/12 5:40 AM
orig-collectl.out
81 kB
16/Apr/12 11:11 AM
orig-ior.out
2 kB
16/Apr/12 11:11 AM
orig-opreport-l.out
146 kB
16/Apr/12 11:11 AM
patched-collectl.out
34 kB
16/Apr/12 11:11 AM
patched-ior.out
2 kB
16/Apr/12 11:11 AM
patched-opreport-l.out
137 kB
16/Apr/12 11:11 AM
single-client-performance.xlsx
42 kB
08/Jul/12 10:00 AM
stats-1.8.zip
14 kB
27/Oct/11 11:23 AM
stats-2.1.zip
64 kB
11/Oct/11 3:26 AM
test2-various-version.zip
264 kB
17/Apr/12 8:01 PM
test-patchset-2.zip
147 kB
16/Apr/12 8:38 PM

Issue Links

is related to

LU-1408 single client's performance regression test

Resolved

LU-1413 difference of single client's performance between b2_1 and 2.1.2RC0

Resolved

LU-3321 2.x single thread/process throughput degraded from 1.8

Resolved

LU-141 port lustre client page cache shrinker back to clio

Resolved

LU-1201 Lustre crypto hash cleanup

Resolved

Activity

[LU-744] Single client's performance degradation on 2.1

Shuichi Ihara (Inactive) added a comment - 03/Jan/13 11:18 AM

Jinshan,

I just tested http://review.whamcloud.com/4943

attached includes all results and oprofile output.
it looks obviously better than previous numbers. but I wonder if we could get more better performance since we are getting 5.6GB/sec sometimes. (see collectl.out) want to keep these around these numbers

Shuichi Ihara (Inactive) added a comment - 03/Jan/13 11:18 AM Jinshan, I just tested http://review.whamcloud.com/4943 attached includes all results and oprofile output. it looks obviously better than previous numbers. but I wonder if we could get more better performance since we are getting 5.6GB/sec sometimes. (see collectl.out) want to keep these around these numbers

Jinshan Xiong (Inactive) added a comment - 02/Jan/13 5:11 PM

My next patch will be to remove top cache of cl_page.

Jinshan Xiong (Inactive) added a comment - 02/Jan/13 5:11 PM My next patch will be to remove top cache of cl_page.

Jinshan Xiong (Inactive) added a comment - 02/Jan/13 3:12 PM

There is a new patch for performance tune at: http://review.whamcloud.com/4943. Please give it a try.

Jinshan Xiong (Inactive) added a comment - 02/Jan/13 3:12 PM There is a new patch for performance tune at: http://review.whamcloud.com/4943 . Please give it a try.

Jinshan Xiong (Inactive) added a comment - 19/Nov/12 7:01 PM

Hi Ihara, this is because CPU is still under contention so the performance dropped when the hosekeeping work started. Can you please run the benchmark one more time with patches 4519, 4472 and 4617. This should help a little bit.

Jinshan Xiong (Inactive) added a comment - 19/Nov/12 7:01 PM Hi Ihara, this is because CPU is still under contention so the performance dropped when the hosekeeping work started. Can you please run the benchmark one more time with patches 4519, 4472 and 4617. This should help a little bit.

Prakash Surya (Inactive) added a comment - 19/Nov/12 11:38 AM - edited

Jinshan, Frederik, When using the ~~LU-2139~~ patches on the client but not on the server, it is normal to see the IO pause/stall as you are seeing. I'm not sure if this is happening for this this, but what can happen is:

1. Client performs IO
2. Client receives completion callback for bulk RPC
3. Bulk pages now clean but "unstable" (uncommitted on OST)
4. NR_UNSTABLE_NFS incremented for each unstable page (due to http://review.whamcloud.com/4245)
5. NR_UNSTABLE_NFS grows larger than (background_thresh + dirty_thresh)/2
6. Kernel stalls IO waiting for NR_UNSTABLE_NFS to decrease (via kernel function: balance_dirty_pages)
7. Client receives Lustre ping sometime in future (around 20 seconds later?), updating last_committed
8. Bulk pages now "stable" on client and can be reclaimed, lowering NR_UNSTABLE_NFS
9. Go back to step 1.

Reading the above comments, it looks like the ~~LU-2139~~ patches are working as intended (avoiding OOMs at the cost of performance). Although I admit, the performance is terrible when you hit the NR_UNSTABLE_NFS limit and the kernel halts all IO (put is better than OOM, IMO). To improve on this, http://review.whamcloud.com/4375 needs to be applied to both clients and servers. This will allow the server to proactively commit bulk pages as they come in, hopefully preventing the client from exhausting its memory with unstable pages and avoiding the "stall" in balance_dirty_pages. With it applied to the server, I'd expect NR_UNSTABLE_NFS to remain "low", and the 4GB file speeds to reflect the 1GB speeds.

Please keep in mind, the ~~LU-2139~~ patches are all experimental and subject to change.

On the client, with the ~~LU-2139~~ patches applied, you might find it interesting to watch lctl get_param llite.*.unstable_stats and cat /proc/meminfo | grep NFS_Unstable as the test is running.

For example:

$ watch -n0.1 'lctl get_param llite.*.unstable_stats'
$ watch -n0.1 'cat /proc/meminfo | grep NFS_Unstable'

Those will give you an idea for the amount of unstable pages the client has at a given time. If that value gets "high" (exact value depends on your dirty limits, but probably around 1/4 of RAM) then what I detailed above is most likely the cause for the bad performance.

Prakash Surya (Inactive) added a comment - 19/Nov/12 11:38 AM - edited Jinshan, Frederik, When using the LU-2139 patches on the client but not on the server, it is normal to see the IO pause/stall as you are seeing. I'm not sure if this is happening for this this, but what can happen is: 1. Client performs IO 2. Client receives completion callback for bulk RPC 3. Bulk pages now clean but "unstable" (uncommitted on OST) 4. NR_UNSTABLE_NFS incremented for each unstable page (due to http://review.whamcloud.com/4245 ) 5. NR_UNSTABLE_NFS grows larger than (background_thresh + dirty_thresh)/2 6. Kernel stalls IO waiting for NR_UNSTABLE_NFS to decrease (via kernel function: balance_dirty_pages) 7. Client receives Lustre ping sometime in future (around 20 seconds later?), updating last_committed 8. Bulk pages now "stable" on client and can be reclaimed, lowering NR_UNSTABLE_NFS 9. Go back to step 1. Reading the above comments, it looks like the LU-2139 patches are working as intended (avoiding OOMs at the cost of performance). Although I admit, the performance is terrible when you hit the NR_UNSTABLE_NFS limit and the kernel halts all IO (put is better than OOM, IMO). To improve on this, http://review.whamcloud.com/4375 needs to be applied to both clients and servers. This will allow the server to proactively commit bulk pages as they come in, hopefully preventing the client from exhausting its memory with unstable pages and avoiding the "stall" in balance_dirty_pages. With it applied to the server, I'd expect NR_UNSTABLE_NFS to remain "low", and the 4GB file speeds to reflect the 1GB speeds. Please keep in mind, the LU-2139 patches are all experimental and subject to change. On the client, with the LU-2139 patches applied, you might find it interesting to watch lctl get_param llite.*.unstable_stats and cat /proc/meminfo | grep NFS_Unstable as the test is running. For example: $ watch -n0.1 'lctl get_param llite.*.unstable_stats' $ watch -n0.1 'cat /proc/meminfo | grep NFS_Unstable' Those will give you an idea for the amount of unstable pages the client has at a given time. If that value gets "high" (exact value depends on your dirty limits, but probably around 1/4 of RAM) then what I detailed above is most likely the cause for the bad performance.

Shuichi Ihara (Inactive) added a comment - 17/Nov/12 12:03 PM

Jinshan,

Yes, I upgraded MPI libbary a couple of weeks ago. I found a hardware problem and fixed it. Now mca_btl_sm_component_progress less consuming. it's still high compared to previous library though...

This attachment includes three test results

1. master without any patches
2. master + 4519 (2nd patch) + 4472 (2nd patch)
3. master + 4519 (2nd patch) + 4472 (2nd patch) and run mpi with pthread, instead of shared memory.

patches help less CPU consuming and improve the performance, but still drop the performance when the client is no free memory.

Shuichi Ihara (Inactive) added a comment - 17/Nov/12 12:03 PM Jinshan, Yes, I upgraded MPI libbary a couple of weeks ago. I found a hardware problem and fixed it. Now mca_btl_sm_component_progress less consuming. it's still high compared to previous library though... This attachment includes three test results 1. master without any patches 2. master + 4519 (2nd patch) + 4472 (2nd patch) 3. master + 4519 (2nd patch) + 4472 (2nd patch) and run mpi with pthread, instead of shared memory. patches help less CPU consuming and improve the performance, but still drop the performance when the client is no free memory.

Jinshan Xiong (Inactive) added a comment - 16/Nov/12 3:24 PM

Hi Ihara, I saw there is significant CPU usage for library mca_btl_sm.so(11.7%) and libopen-pal.so.0.0.0(4.7%) but the performance data shown on Sep 5 they only consumed 0.13% and 0.05%. They are openmpi libraries. Did you do any upgrade on these libraries?

Anyway, I revised patch 4519 and restored 4472 to remove memory stalls, please apply them in your next benchmark. However we have to figure out why openmpi libraries consumed so much cpu before seeing the performance improvement.

Jinshan Xiong (Inactive) added a comment - 16/Nov/12 3:24 PM Hi Ihara, I saw there is significant CPU usage for library mca_btl_sm.so(11.7%) and libopen-pal.so.0.0.0(4.7%) but the performance data shown on Sep 5 they only consumed 0.13% and 0.05%. They are openmpi libraries. Did you do any upgrade on these libraries? Anyway, I revised patch 4519 and restored 4472 to remove memory stalls, please apply them in your next benchmark. However we have to figure out why openmpi libraries consumed so much cpu before seeing the performance improvement.

Jinshan Xiong (Inactive) added a comment - 16/Nov/12 1:28 PM

Frederik for delay response. From the test results, it looks there may be some issues with ~~LU-2139~~ patches. You can see it in the collectl stats:

46 45 72551 10187 2G 10M 410M 399M 188M 162M 0 0 379904 371
47 47 74169 7109 2G 11M 786M 775M 269M 162M 0 0 385024 376
25 25 40289 4190 2G 11M 982M 971M 312M 162M 0 0 200704 196
6 6 8440 229 2G 11M 982M 971M 313M 162M 0 0 0 0
7 7 10545 249 2G 11M 982M 971M 313M 164M 0 0 0 0

.....

Unknown macro: {20 seconds later}

7 7 10639 241 2G 11M 983M 973M 311M 163M 0 0 0 0
9 8 12408 236 2G 11M 983M 973M 311M 163M 0 0 0 0
34 34 53022 4218 1G 11M 1G 1G 357M 163M 0 0 258048 252
50 50 77645 7414 1G 11M 1G 1G 447M 163M 0 0 422912 413

There were some IO activities for 2 or 3 seconds and then stay quiet for around 20 seconds and then do IO again. It seems like LRU budget was running out so OSC had to wait for commit on OST to be finished.

I will work on this. Thanks for testing.

Jinshan Xiong (Inactive) added a comment - 16/Nov/12 1:28 PM Frederik for delay response. From the test results, it looks there may be some issues with LU-2139 patches. You can see it in the collectl stats: 46 45 72551 10187 2G 10M 410M 399M 188M 162M 0 0 379904 371 47 47 74169 7109 2G 11M 786M 775M 269M 162M 0 0 385024 376 25 25 40289 4190 2G 11M 982M 971M 312M 162M 0 0 200704 196 6 6 8440 229 2G 11M 982M 971M 313M 162M 0 0 0 0 7 7 10545 249 2G 11M 982M 971M 313M 164M 0 0 0 0 ..... Unknown macro: {20 seconds later} 7 7 10639 241 2G 11M 983M 973M 311M 163M 0 0 0 0 9 8 12408 236 2G 11M 983M 973M 311M 163M 0 0 0 0 34 34 53022 4218 1G 11M 1G 1G 357M 163M 0 0 258048 252 50 50 77645 7414 1G 11M 1G 1G 447M 163M 0 0 422912 413 There were some IO activities for 2 or 3 seconds and then stay quiet for around 20 seconds and then do IO again. It seems like LRU budget was running out so OSC had to wait for commit on OST to be finished. I will work on this. Thanks for testing.

Shuichi Ihara (Inactive) added a comment - 16/Nov/12 7:58 AM

Jinshan, tested master + patch 4519 on both servers and client, but it seems to be still same results.

Shuichi Ihara (Inactive) added a comment - 16/Nov/12 7:58 AM Jinshan, tested master + patch 4519 on both servers and client, but it seems to be still same results.

Frederik Ferner (Inactive) added a comment - 13/Nov/12 11:44 AM

So far all these tests have been done with 2.3.0 on the servers. I've not tried 2.3.54 on any of my test servers yet. I'll try to find some time over the next few days.

Frederik Ferner (Inactive) added a comment - 13/Nov/12 11:44 AM So far all these tests have been done with 2.3.0 on the servers. I've not tried 2.3.54 on any of my test servers yet. I'll try to find some time over the next few days.

Andreas Dilger added a comment - 13/Nov/12 11:40 AM

Frederik, I'm assuming for your test results that you are running the same version on both the client and aerver? Would it also be possible for you to test 2.3.0 clients with 2.3.54 servers and vice versa? That would allow us to isolate if the slowdown seen with 2.3.54 is due to changes in the client or server.

Andreas Dilger added a comment - 13/Nov/12 11:40 AM Frederik, I'm assuming for your test results that you are running the same version on both the client and aerver? Would it also be possible for you to test 2.3.0 clients with 2.3.54 servers and vice versa? That would allow us to isolate if the slowdown seen with 2.3.54 is due to changes in the client or server.

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Shuichi Ihara (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 35 Start watching this issue

Dates

Created:: 09/Oct/11 9:24 PM

Updated:: 13/Mar/14 9:40 PM

Resolved:: 06/Feb/14 2:53 PM