[LU-11164] lvbo_*() methods to reuse env Created: 23/Jul/18 Updated: 01/Mar/19 Resolved: 06/Oct/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | perf_optimization | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
The lvbo methods have to reallocate lu_env every time, which can be quite expensive in terms of CPU cycles at scale. |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 28/Jul/18 ] |
|
Patch is at https://review.whamcloud.com/#/c/32832/ |
| Comment by James A Simmons [ 14/Aug/18 ] |
|
I see this is marked as upstream. Is that correct? |
| Comment by Peter Jones [ 06/Sep/18 ] |
|
No it's not - thanks for point that out!
|
| Comment by Andreas Dilger [ 25/Sep/18 ] |
|
Patrick, The ost_io.svg |
| Comment by Shuichi Ihara [ 01/Oct/18 ] |
|
Here is performance resutls with patches. [randread] ioengine=sync rw=randread blocksize=4096 iodepth=16 direct=1 size=4g runtime=60 numjobs=24 group_reporting directory=/cache1/fio.out filename_format=f.$jobnum.$filenum master without patch All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct 1 09:00:53 2018
read: IOPS=564k, BW=2205Mi (2312M)(129GiB/60006msec)
clat (usec): min=210, max=228722, avg=1370.30, stdev=623.75
lat (usec): min=210, max=228722, avg=1370.45, stdev=623.75
bw ( KiB/s): min= 1192, max=11160, per=0.13%, avg=2911.36, stdev=1003.58, samples=91693
iops : min= 298, max= 2790, avg=727.83, stdev=250.90, samples=91693
lat (usec) : 250=0.01%, 500=3.09%, 750=4.03%, 1000=11.72%
lat (msec) : 2=71.92%, 4=9.13%, 10=0.06%, 20=0.05%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=0.19%, sys=1.70%, ctx=33884824, majf=0, minf=49349
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=33867532,0,0,0 short=0,0,0,0 dropped=0,0,0,0
master with patch 32832 All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct 1 00:50:54 2018
read: IOPS=639k, BW=2497Mi (2619M)(146GiB/60062msec)
clat (usec): min=202, max=303137, avg=1199.86, stdev=731.66
lat (usec): min=202, max=303137, avg=1200.01, stdev=731.65
bw ( KiB/s): min= 624, max=12712, per=0.13%, avg=3319.10, stdev=1131.37, samples=91719
iops : min= 156, max= 3178, avg=829.76, stdev=282.84, samples=91719
lat (usec) : 250=0.01%, 500=4.35%, 750=4.64%, 1000=21.49%
lat (msec) : 2=65.65%, 4=3.70%, 10=0.07%, 20=0.08%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.21%, sys=1.89%, ctx=38418553, majf=0, minf=54483
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=38398687,0,0,0 short=0,0,0,0 dropped=0,0,0,0
564k vs 639k IOPS. patch contributes 13% performance gain. |
| Comment by Patrick Farrell (Inactive) [ 01/Oct/18 ] |
|
Thanks for the detailed benchmark info, Andreas, Ihara. (Sorry, I was not originally following the LU so did not see the earlier comment.) |
| Comment by Gerrit Updater [ 05/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32832/ |
| Comment by James A Simmons [ 05/Oct/18 ] |
|
Before we close this has anyone measured metadata performance? I wonder if their was any impact form this work. |
| Comment by Peter Jones [ 06/Oct/18 ] |
|
Landed for 2.12. James, perf testing is a standard part of release testing |
| Comment by Andrew Perepechko [ 25/Feb/19 ] |
|
I'm seeing significant performance drop (-15%) in createmany -o between 2.11 and 2.12. Testing historical commits with a small step shows about 5% performance drop right after this patch. |
| Comment by Alex Zhuravlev [ 25/Feb/19 ] |
|
any profiling information? |
| Comment by Andrew Perepechko [ 25/Feb/19 ] |
|
There's some profiling information collected during testing in automatic mode. I hope it can be useful. There's no significant difference in open(O_CREAT) processing time, but close() seems to have become slower: - open/close 5948 (time 1551093589.39 total 10.00 last 594.79) - open/close 10000 (time 1551093596.20 total 16.82 last 594.22) - open/close 15658 (time 1551093606.20 total 26.82 last 565.79) total: 20000 open/close in 34.55 seconds: 578.94 ops/second ll_atomic_open:22765503.0:20000:1138.0 ll_file_release:9271202.0:20000:463.0 ll_inode_revalidate:850446.0:40005:21.0 ll_file_open:326223.0:20000:16.0 lmv_intent_lock:21263116.0:60005:354.0 mdc_enqueue_base:20668025.0:20002:1033.0 mdc_close:8846957.0:20000:442.0 mdc_intent_lock:21006833.0:60005:350.0 ldlm_cli_enqueue:19807126.0:20002:990.0 ldlm_handle_enqueue0:12404488.0:20002:620.0 ldlm_cli_enqueue_fini:870123.0:20002:43.0 ptlrpc_queue_wait:27100485.0:40004:677.0 ptlrpc_server_handle_request:19790070.0:66468:297.0 mdt_reint_open:11116636.0:20000:555.0 mdt_close:2967136.0:20000:148.0 mdt_intent_getattr:147.0:2:73.0 mdd_create:5972754.0:20000:298.0 mdd_create_object:2055115.0:20000:102.0 mdd_trans_start:1641342.0:60000:27.0 mdd_trans_stop:1128474.0:60000:18.0 mdd_update_time:131643.0:20000:6.0 mdd_changelog_ns_store:11204.0:20000:0.0 ldiskfs_create_inode:1199474.0:20000:59.0 pre- - open/close 6245 (time 1551094193.65 total 10.00 last 624.45) - open/close 10000 (time 1551094199.67 total 16.02 last 623.75) - open/close 15804 (time 1551094209.68 total 26.02 last 580.35) total: 20000 open/close in 33.23 seconds: 601.83 ops/second ll_atomic_open:22203052.0:20000:1110.0 ll_file_release:8576370.0:20000:428.0 ll_inode_revalidate:823214.0:40005:20.0 ll_file_open:321143.0:20000:16.0 lmv_intent_lock:20684255.0:60005:344.0 mdc_enqueue_base:20090571.0:20002:1004.0 mdc_close:8149427.0:20000:407.0 mdc_intent_lock:20431136.0:60005:340.0 ldlm_cli_enqueue:19175889.0:20002:958.0 ldlm_handle_enqueue0:12598163.0:20002:629.0 ldlm_cli_enqueue_fini:908190.0:20002:45.0 ptlrpc_queue_wait:25720604.0:40004:642.0 ptlrpc_server_handle_request:19790636.0:41798:473.0 mdt_reint_open:11314481.0:20000:565.0 mdt_close:2930872.0:20000:146.0 mdt_intent_getattr:157.0:2:78.0 mdd_create:6212753.0:20000:310.0 mdd_create_object:2052026.0:20000:102.0 mdd_trans_start:1929037.0:60000:32.0 mdd_trans_stop:1121551.0:60000:18.0 mdd_update_time:130001.0:20000:6.0 mdd_changelog_ns_store:9368.0:20000:0.0 ldiskfs_create_inode:1182995.0:20000:59.0 open times look similar (1138.0 us vs 1110.0 us per operation, total 22.7 mln us vs 22.2 mln us), but close times have changed (463 us vs 428 us, 10% difference, total 9.2 mln us vs 8.5 mln us). The results stay similar for the neighbor commits. |
| Comment by Alex Zhuravlev [ 25/Feb/19 ] |
|
did you try to collect some data with say oprofile? |
| Comment by Alex Zhuravlev [ 25/Feb/19 ] |
|
yet another option would be to collect rpc_stats on the server side to see is it on the server or client side. |
| Comment by Andrew Perepechko [ 25/Feb/19 ] |
No, why? It's an outdated tool. The ftrace profiling data pasted above is more accurate. I will collect additional data for this case if you have no idea why performance dropped. |
| Comment by Alex Zhuravlev [ 25/Feb/19 ] |
|
at the moment I don't understand why passing extra argument to few methods would affect MDS performance that much. |
| Comment by Shuichi Ihara [ 25/Feb/19 ] |
|
Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch. |
| Comment by Andrew Perepechko [ 25/Feb/19 ] |
The directory in the test is configured for the DoM layout: # lfs setstripe -E 1048576 -L mdt . # createmany -o file 1 20000 |
| Comment by Andrew Perepechko [ 26/Feb/19 ] |
|
I'm not an expert in this patch, but isn't it performing lu_env_init()/lu_env_fini() in ptlrpc_set_wait() when this sequence isn't really needed? This observation seems to be backed by the posted profiling data: mdt_close() execution times are equal with and without the patch: 146/148 us, the differences between ll_file_release() and mdc_close() execution times are equal too: 463-442=21=428-407. Apparently, the observed performance drop happens between mdc_close() and mdt_close() which smoothly correlates with the possible lu_env_init()/lu_env_fini() overhead in ptlrpc_set_wait(). |
| Comment by Andrew Perepechko [ 26/Feb/19 ] |
|
The following change restores createmany -o performance to that of pre diff --git a/lustre/ptlrpc/client.c b/lustre/ptlrpc/client.c index 1ea3b9fdc1..f2a994c707 100644 --- a/lustre/ptlrpc/client.c +++ b/lustre/ptlrpc/client.c @@ -2327,7 +2327,9 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set) struct list_head *tmp; struct ptlrpc_request *req; struct l_wait_info lwi; +#if 0 struct lu_env _env; +#endif time64_t timeout; int rc; ENTRY; @@ -2345,6 +2347,7 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set) if (list_empty(&set->set_requests)) RETURN(0); +#if 0 /* ideally we want env provide by the caller all the time, * but at the moment that would mean a massive change in * LDLM while benefits would be close to zero, so just @@ -2356,6 +2359,7 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set) RETURN(rc); env = &_env; } +#endif do { timeout = ptlrpc_set_next_timeout(set); @@ -2436,8 +2440,10 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set) rc = req->rq_status; } +#if 0 if (env && env == &_env) lu_env_fini(&_env); +#endif RETURN(rc); } |
| Comment by Andrew Perepechko [ 27/Feb/19 ] |
|
Alex, Shuichi, what do you think about removing this allocation? Is it crucial for this patch? |
| Comment by Alex Zhuravlev [ 27/Feb/19 ] |
|
panda , I'm testing https://review.whamcloud.com/#/c/34333/ |
| Comment by Peter Jones [ 27/Feb/19 ] |
|
Given that |
| Comment by Alex Zhuravlev [ 28/Feb/19 ] |
|
panda would you please create a new ticket like Peter suggested? |
| Comment by Andrew Perepechko [ 28/Feb/19 ] |
| Comment by Peter Jones [ 28/Feb/19 ] |
|
Thanks Andrew! |