[LU-11164] lvbo_*() methods to reuse env Created: 23/Jul/18  Updated: 01/Mar/19  Resolved: 06/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Improvement Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: perf_optimization

Attachments: File master-patch-32832.svg     File ost_io.svg    
Issue Links:
Related
is related to LU-11347 Do not use pagecache for SSD I/O when... Resolved
is related to LU-12034 env allocation in ptlrpc_set_wait() c... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

The lvbo methods have to reallocate lu_env every time, which can be quite expensive in terms of CPU cycles at scale.
The layers above can pass lu_env to reuse existing one.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 28/Jul/18 ]

Patch is at https://review.whamcloud.com/#/c/32832/

Comment by James A Simmons [ 14/Aug/18 ]

I see this is marked as upstream. Is that correct?

Comment by Peter Jones [ 06/Sep/18 ]

No it's not - thanks for point that out!

 

Comment by Andreas Dilger [ 25/Sep/18 ]

Patrick,
attached is a flame graph ( ost_io.svg ) showing CPU usage for the OST under a high-throughput random read workload (fake IO used so that no storage overhead is present, just network and RPC processing). In ost_lvbo_update() the lu_env_init() and lu_env_fini() functions are consuming over 10% of the OSS CPU for basically no benefit. The master-patch-32832.svg flame graph shows the ost_lvbo_update() CPU usage is down to 1.5% when the patch is applied, which resulted in a 6.3% performance improvement for random 4KB reads. Alex, could you please include these results in the commit comment so that it is more clear why we want to land that patch.

The ost_io.svg graph is also showing find_or_create_page() using 4.25% of CPU, which drove the creation of patch https://review.whamcloud.com/32875 "LU-11347 osd: do not use pagecache for I/O".

Comment by Shuichi Ihara [ 01/Oct/18 ]

Here is performance resutls with patches.
4K ranodm read from 768 processes on 32 clients with fio below.

[randread]
ioengine=sync
rw=randread
blocksize=4096
iodepth=16
direct=1
size=4g
runtime=60
numjobs=24
group_reporting
directory=/cache1/fio.out
filename_format=f.$jobnum.$filenum

master without patch

All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct  1 09:00:53 2018
   read: IOPS=564k, BW=2205Mi (2312M)(129GiB/60006msec)
    clat (usec): min=210, max=228722, avg=1370.30, stdev=623.75
     lat (usec): min=210, max=228722, avg=1370.45, stdev=623.75
   bw (  KiB/s): min= 1192, max=11160, per=0.13%, avg=2911.36, stdev=1003.58, samples=91693
   iops        : min=  298, max= 2790, avg=727.83, stdev=250.90, samples=91693
  lat (usec)   : 250=0.01%, 500=3.09%, 750=4.03%, 1000=11.72%
  lat (msec)   : 2=71.92%, 4=9.13%, 10=0.06%, 20=0.05%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.19%, sys=1.70%, ctx=33884824, majf=0, minf=49349
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=33867532,0,0,0 short=0,0,0,0 dropped=0,0,0,0

master with patch 32832

All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct  1 00:50:54 2018
   read: IOPS=639k, BW=2497Mi (2619M)(146GiB/60062msec)
    clat (usec): min=202, max=303137, avg=1199.86, stdev=731.66
     lat (usec): min=202, max=303137, avg=1200.01, stdev=731.65
   bw (  KiB/s): min=  624, max=12712, per=0.13%, avg=3319.10, stdev=1131.37, samples=91719
   iops        : min=  156, max= 3178, avg=829.76, stdev=282.84, samples=91719
  lat (usec)   : 250=0.01%, 500=4.35%, 750=4.64%, 1000=21.49%
  lat (msec)   : 2=65.65%, 4=3.70%, 10=0.07%, 20=0.08%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=0.21%, sys=1.89%, ctx=38418553, majf=0, minf=54483
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=38398687,0,0,0 short=0,0,0,0 dropped=0,0,0,0

564k vs 639k IOPS. patch contributes 13% performance gain.

Comment by Patrick Farrell (Inactive) [ 01/Oct/18 ]

Thanks for the detailed benchmark info, Andreas, Ihara.  (Sorry, I was not originally following the LU so did not see the earlier comment.)

Comment by Gerrit Updater [ 05/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32832/
Subject: LU-11164 ldlm: pass env to lvbo methods
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e02cb40761ff8aae3df76c4210a345420b6d4ba1

Comment by James A Simmons [ 05/Oct/18 ]

Before we close this has anyone measured metadata performance? I wonder if their was any impact form this work.

Comment by Peter Jones [ 06/Oct/18 ]

Landed for 2.12. James, perf testing is a standard part of release testing

Comment by Andrew Perepechko [ 25/Feb/19 ]

I'm seeing significant performance drop (-15%) in createmany -o between 2.11 and 2.12. Testing historical commits with a small step shows about 5% performance drop right after this patch.

Comment by Alex Zhuravlev [ 25/Feb/19 ]

any profiling information?

Comment by Andrew Perepechko [ 25/Feb/19 ]

There's some profiling information collected during testing in automatic mode. I hope it can be useful.

There's no significant difference in open(O_CREAT) processing time, but close() seems to have become slower:

LU-11164:

 - open/close 5948 (time 1551093589.39 total 10.00 last 594.79)
 - open/close 10000 (time 1551093596.20 total 16.82 last 594.22)
 - open/close 15658 (time 1551093606.20 total 26.82 last 565.79)
total: 20000 open/close in 34.55 seconds: 578.94 ops/second

ll_atomic_open:22765503.0:20000:1138.0 ll_file_release:9271202.0:20000:463.0 ll_inode_revalidate:850446.0:40005:21.0 ll_file_open:326223.0:20000:16.0
lmv_intent_lock:21263116.0:60005:354.0
mdc_enqueue_base:20668025.0:20002:1033.0 mdc_close:8846957.0:20000:442.0 mdc_intent_lock:21006833.0:60005:350.0
ldlm_cli_enqueue:19807126.0:20002:990.0 ldlm_handle_enqueue0:12404488.0:20002:620.0 ldlm_cli_enqueue_fini:870123.0:20002:43.0
ptlrpc_queue_wait:27100485.0:40004:677.0 ptlrpc_server_handle_request:19790070.0:66468:297.0
mdt_reint_open:11116636.0:20000:555.0 mdt_close:2967136.0:20000:148.0 mdt_intent_getattr:147.0:2:73.0
mdd_create:5972754.0:20000:298.0 mdd_create_object:2055115.0:20000:102.0 mdd_trans_start:1641342.0:60000:27.0 mdd_trans_stop:1128474.0:60000:18.0 mdd_update_time:131643.0:20000:6.0 mdd_changelog_ns_store:11204.0:20000:0.0
ldiskfs_create_inode:1199474.0:20000:59.0
 

pre-LU-11164:

 - open/close 6245 (time 1551094193.65 total 10.00 last 624.45)
 - open/close 10000 (time 1551094199.67 total 16.02 last 623.75)
 - open/close 15804 (time 1551094209.68 total 26.02 last 580.35)
total: 20000 open/close in 33.23 seconds: 601.83 ops/second

ll_atomic_open:22203052.0:20000:1110.0 ll_file_release:8576370.0:20000:428.0 ll_inode_revalidate:823214.0:40005:20.0 ll_file_open:321143.0:20000:16.0
lmv_intent_lock:20684255.0:60005:344.0
mdc_enqueue_base:20090571.0:20002:1004.0 mdc_close:8149427.0:20000:407.0 mdc_intent_lock:20431136.0:60005:340.0
ldlm_cli_enqueue:19175889.0:20002:958.0 ldlm_handle_enqueue0:12598163.0:20002:629.0 ldlm_cli_enqueue_fini:908190.0:20002:45.0
ptlrpc_queue_wait:25720604.0:40004:642.0 ptlrpc_server_handle_request:19790636.0:41798:473.0
mdt_reint_open:11314481.0:20000:565.0 mdt_close:2930872.0:20000:146.0 mdt_intent_getattr:157.0:2:78.0
mdd_create:6212753.0:20000:310.0 mdd_create_object:2052026.0:20000:102.0 mdd_trans_start:1929037.0:60000:32.0 mdd_trans_stop:1121551.0:60000:18.0 mdd_update_time:130001.0:20000:6.0 mdd_changelog_ns_store:9368.0:20000:0.0
ldiskfs_create_inode:1182995.0:20000:59.0

open times look similar (1138.0 us vs 1110.0 us per operation, total 22.7 mln us vs 22.2 mln us), but close times have changed (463 us vs 428 us, 10% difference, total 9.2 mln us vs 8.5 mln us).

The results stay similar for the neighbor commits.

Comment by Alex Zhuravlev [ 25/Feb/19 ]

did you try to collect some data with say oprofile?

Comment by Alex Zhuravlev [ 25/Feb/19 ]

yet another option would be to collect rpc_stats on the server side to see is it on the server or client side.

Comment by Andrew Perepechko [ 25/Feb/19 ]

did you try to collect some data with say oprofile?

No, why? It's an outdated tool. The ftrace profiling data pasted above is more accurate.

I will collect additional data for this case if you have no idea why performance dropped.

Comment by Alex Zhuravlev [ 25/Feb/19 ]

at the moment I don't understand why passing extra argument to few methods would affect MDS performance that much.
will keep investigating..
any additional data is very appreciated.

Comment by Shuichi Ihara [ 25/Feb/19 ]

Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch.

Comment by Andrew Perepechko [ 25/Feb/19 ]

Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch.

The directory in the test is configured for the DoM layout:

# lfs setstripe -E 1048576 -L mdt .
# createmany -o file 1 20000
Comment by Andrew Perepechko [ 26/Feb/19 ]

I'm not an expert in this patch, but isn't it performing lu_env_init()/lu_env_fini() in ptlrpc_set_wait() when this sequence isn't really needed?

This observation seems to be backed by the posted profiling data:

mdt_close() execution times are equal with and without the patch: 146/148 us, the differences between ll_file_release() and mdc_close() execution times are equal too: 463-442=21=428-407.

Apparently, the observed performance drop happens between mdc_close() and mdt_close() which smoothly correlates with the possible lu_env_init()/lu_env_fini() overhead in ptlrpc_set_wait().

Comment by Andrew Perepechko [ 26/Feb/19 ]

The following change restores createmany -o performance to that of pre LU-11164 for me:

diff --git a/lustre/ptlrpc/client.c b/lustre/ptlrpc/client.c
index 1ea3b9fdc1..f2a994c707 100644
--- a/lustre/ptlrpc/client.c
+++ b/lustre/ptlrpc/client.c
@@ -2327,7 +2327,9 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
        struct list_head *tmp;
        struct ptlrpc_request *req;
        struct l_wait_info lwi;
+#if 0
        struct lu_env _env;
+#endif
        time64_t timeout;
        int rc;
        ENTRY;
@@ -2345,6 +2347,7 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
        if (list_empty(&set->set_requests))
                 RETURN(0);
 
+#if 0
        /* ideally we want env provide by the caller all the time,
         * but at the moment that would mean a massive change in
         * LDLM while benefits would be close to zero, so just
@@ -2356,6 +2359,7 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
                        RETURN(rc);
                env = &_env;
        }
+#endif
 
         do {
                 timeout = ptlrpc_set_next_timeout(set);
@@ -2436,8 +2440,10 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
                         rc = req->rq_status;
         }
 
+#if 0
        if (env && env == &_env)
                lu_env_fini(&_env);
+#endif
 
        RETURN(rc);
 }
Comment by Andrew Perepechko [ 27/Feb/19 ]

Alex, Shuichi, what do you think about removing this allocation? Is it crucial for this patch?

Comment by Alex Zhuravlev [ 27/Feb/19 ]

panda , I'm testing https://review.whamcloud.com/#/c/34333/

Comment by Peter Jones [ 27/Feb/19 ]

Given that LU-11164 landed in 2.12.0. which is now released it would be very helpful to move this discussion to a new Jira ticket linked to this one. That will make it easier to track getting the fix landed and included in a 2.12.x maintenance release - thanks!

Comment by Alex Zhuravlev [ 28/Feb/19 ]

panda would you please create a new ticket like Peter suggested?

Comment by Andrew Perepechko [ 28/Feb/19 ]

bzzz, done: https://jira.whamcloud.com/browse/LU-12034

Comment by Peter Jones [ 28/Feb/19 ]

Thanks Andrew!

Generated at Sat Feb 10 02:41:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.