[LU-11164] lvbo_*() methods to reuse env - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.12.0
Affects Version/s: None
Labels:
- perf_optimization

Rank (Obsolete):
9223372036854775807

Description

The lvbo methods have to reallocate lu_env every time, which can be quite expensive in terms of CPU cycles at scale.
The layers above can pass lu_env to reuse existing one.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

master-patch-32832.svg
114 kB
25/Sep/18 10:00 PM
ost_io.svg
433 kB
25/Sep/18 9:51 PM

Issue Links

is related to

LU-12034 env allocation in ptlrpc_set_wait() causes slowdown on the client

Resolved

is related to

LU-11347 Do not use pagecache for SSD I/O when read/write cache are disabled

Resolved

Activity

[LU-11164] lvbo_*() methods to reuse env

Alex Zhuravlev added a comment - 25/Feb/19 12:24 PM

yet another option would be to collect rpc_stats on the server side to see is it on the server or client side.

Alex Zhuravlev added a comment - 25/Feb/19 12:24 PM yet another option would be to collect rpc_stats on the server side to see is it on the server or client side.

Alex Zhuravlev added a comment - 25/Feb/19 12:23 PM

did you try to collect some data with say oprofile?

Alex Zhuravlev added a comment - 25/Feb/19 12:23 PM did you try to collect some data with say oprofile?

Andrew Perepechko added a comment - 25/Feb/19 12:17 PM - edited

There's some profiling information collected during testing in automatic mode. I hope it can be useful.

There's no significant difference in open(O_CREAT) processing time, but close() seems to have become slower:

~~LU-11164~~:

 - open/close 5948 (time 1551093589.39 total 10.00 last 594.79)
 - open/close 10000 (time 1551093596.20 total 16.82 last 594.22)
 - open/close 15658 (time 1551093606.20 total 26.82 last 565.79)
total: 20000 open/close in 34.55 seconds: 578.94 ops/second

ll_atomic_open:22765503.0:20000:1138.0 ll_file_release:9271202.0:20000:463.0 ll_inode_revalidate:850446.0:40005:21.0 ll_file_open:326223.0:20000:16.0
lmv_intent_lock:21263116.0:60005:354.0
mdc_enqueue_base:20668025.0:20002:1033.0 mdc_close:8846957.0:20000:442.0 mdc_intent_lock:21006833.0:60005:350.0
ldlm_cli_enqueue:19807126.0:20002:990.0 ldlm_handle_enqueue0:12404488.0:20002:620.0 ldlm_cli_enqueue_fini:870123.0:20002:43.0
ptlrpc_queue_wait:27100485.0:40004:677.0 ptlrpc_server_handle_request:19790070.0:66468:297.0
mdt_reint_open:11116636.0:20000:555.0 mdt_close:2967136.0:20000:148.0 mdt_intent_getattr:147.0:2:73.0
mdd_create:5972754.0:20000:298.0 mdd_create_object:2055115.0:20000:102.0 mdd_trans_start:1641342.0:60000:27.0 mdd_trans_stop:1128474.0:60000:18.0 mdd_update_time:131643.0:20000:6.0 mdd_changelog_ns_store:11204.0:20000:0.0
ldiskfs_create_inode:1199474.0:20000:59.0

pre-~~LU-11164~~:

 - open/close 6245 (time 1551094193.65 total 10.00 last 624.45)
 - open/close 10000 (time 1551094199.67 total 16.02 last 623.75)
 - open/close 15804 (time 1551094209.68 total 26.02 last 580.35)
total: 20000 open/close in 33.23 seconds: 601.83 ops/second

ll_atomic_open:22203052.0:20000:1110.0 ll_file_release:8576370.0:20000:428.0 ll_inode_revalidate:823214.0:40005:20.0 ll_file_open:321143.0:20000:16.0
lmv_intent_lock:20684255.0:60005:344.0
mdc_enqueue_base:20090571.0:20002:1004.0 mdc_close:8149427.0:20000:407.0 mdc_intent_lock:20431136.0:60005:340.0
ldlm_cli_enqueue:19175889.0:20002:958.0 ldlm_handle_enqueue0:12598163.0:20002:629.0 ldlm_cli_enqueue_fini:908190.0:20002:45.0
ptlrpc_queue_wait:25720604.0:40004:642.0 ptlrpc_server_handle_request:19790636.0:41798:473.0
mdt_reint_open:11314481.0:20000:565.0 mdt_close:2930872.0:20000:146.0 mdt_intent_getattr:157.0:2:78.0
mdd_create:6212753.0:20000:310.0 mdd_create_object:2052026.0:20000:102.0 mdd_trans_start:1929037.0:60000:32.0 mdd_trans_stop:1121551.0:60000:18.0 mdd_update_time:130001.0:20000:6.0 mdd_changelog_ns_store:9368.0:20000:0.0
ldiskfs_create_inode:1182995.0:20000:59.0

open times look similar (1138.0 us vs 1110.0 us per operation, total 22.7 mln us vs 22.2 mln us), but close times have changed (463 us vs 428 us, 10% difference, total 9.2 mln us vs 8.5 mln us).

The results stay similar for the neighbor commits.

Andrew Perepechko added a comment - 25/Feb/19 12:17 PM - edited There's some profiling information collected during testing in automatic mode. I hope it can be useful. There's no significant difference in open(O_CREAT) processing time, but close() seems to have become slower: LU-11164 : - open/close 5948 (time 1551093589.39 total 10.00 last 594.79) - open/close 10000 (time 1551093596.20 total 16.82 last 594.22) - open/close 15658 (time 1551093606.20 total 26.82 last 565.79) total: 20000 open/close in 34.55 seconds: 578.94 ops/second ll_atomic_open:22765503.0:20000:1138.0 ll_file_release:9271202.0:20000:463.0 ll_inode_revalidate:850446.0:40005:21.0 ll_file_open:326223.0:20000:16.0 lmv_intent_lock:21263116.0:60005:354.0 mdc_enqueue_base:20668025.0:20002:1033.0 mdc_close:8846957.0:20000:442.0 mdc_intent_lock:21006833.0:60005:350.0 ldlm_cli_enqueue:19807126.0:20002:990.0 ldlm_handle_enqueue0:12404488.0:20002:620.0 ldlm_cli_enqueue_fini:870123.0:20002:43.0 ptlrpc_queue_wait:27100485.0:40004:677.0 ptlrpc_server_handle_request:19790070.0:66468:297.0 mdt_reint_open:11116636.0:20000:555.0 mdt_close:2967136.0:20000:148.0 mdt_intent_getattr:147.0:2:73.0 mdd_create:5972754.0:20000:298.0 mdd_create_object:2055115.0:20000:102.0 mdd_trans_start:1641342.0:60000:27.0 mdd_trans_stop:1128474.0:60000:18.0 mdd_update_time:131643.0:20000:6.0 mdd_changelog_ns_store:11204.0:20000:0.0 ldiskfs_create_inode:1199474.0:20000:59.0 pre- LU-11164 : - open/close 6245 (time 1551094193.65 total 10.00 last 624.45) - open/close 10000 (time 1551094199.67 total 16.02 last 623.75) - open/close 15804 (time 1551094209.68 total 26.02 last 580.35) total: 20000 open/close in 33.23 seconds: 601.83 ops/second ll_atomic_open:22203052.0:20000:1110.0 ll_file_release:8576370.0:20000:428.0 ll_inode_revalidate:823214.0:40005:20.0 ll_file_open:321143.0:20000:16.0 lmv_intent_lock:20684255.0:60005:344.0 mdc_enqueue_base:20090571.0:20002:1004.0 mdc_close:8149427.0:20000:407.0 mdc_intent_lock:20431136.0:60005:340.0 ldlm_cli_enqueue:19175889.0:20002:958.0 ldlm_handle_enqueue0:12598163.0:20002:629.0 ldlm_cli_enqueue_fini:908190.0:20002:45.0 ptlrpc_queue_wait:25720604.0:40004:642.0 ptlrpc_server_handle_request:19790636.0:41798:473.0 mdt_reint_open:11314481.0:20000:565.0 mdt_close:2930872.0:20000:146.0 mdt_intent_getattr:157.0:2:78.0 mdd_create:6212753.0:20000:310.0 mdd_create_object:2052026.0:20000:102.0 mdd_trans_start:1929037.0:60000:32.0 mdd_trans_stop:1121551.0:60000:18.0 mdd_update_time:130001.0:20000:6.0 mdd_changelog_ns_store:9368.0:20000:0.0 ldiskfs_create_inode:1182995.0:20000:59.0 open times look similar (1138.0 us vs 1110.0 us per operation, total 22.7 mln us vs 22.2 mln us), but close times have changed (463 us vs 428 us, 10% difference, total 9.2 mln us vs 8.5 mln us). The results stay similar for the neighbor commits.

Alex Zhuravlev added a comment - 25/Feb/19 11:59 AM

any profiling information?

Alex Zhuravlev added a comment - 25/Feb/19 11:59 AM any profiling information?

Andrew Perepechko added a comment - 25/Feb/19 11:46 AM

I'm seeing significant performance drop (-15%) in createmany -o between 2.11 and 2.12. Testing historical commits with a small step shows about 5% performance drop right after this patch.

Andrew Perepechko added a comment - 25/Feb/19 11:46 AM I'm seeing significant performance drop (-15%) in createmany -o between 2.11 and 2.12. Testing historical commits with a small step shows about 5% performance drop right after this patch.

Peter Jones added a comment - 06/Oct/18 1:23 PM

Landed for 2.12. James, perf testing is a standard part of release testing

Peter Jones added a comment - 06/Oct/18 1:23 PM Landed for 2.12. James, perf testing is a standard part of release testing

James A Simmons added a comment - 05/Oct/18 11:03 PM

Before we close this has anyone measured metadata performance? I wonder if their was any impact form this work.

James A Simmons added a comment - 05/Oct/18 11:03 PM Before we close this has anyone measured metadata performance? I wonder if their was any impact form this work.

Gerrit Updater added a comment - 05/Oct/18 10:26 PM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32832/
Subject: ~~LU-11164~~ ldlm: pass env to lvbo methods
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e02cb40761ff8aae3df76c4210a345420b6d4ba1

Gerrit Updater added a comment - 05/Oct/18 10:26 PM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32832/ Subject: LU-11164 ldlm: pass env to lvbo methods Project: fs/lustre-release Branch: master Current Patch Set: Commit: e02cb40761ff8aae3df76c4210a345420b6d4ba1

Patrick Farrell (Inactive) added a comment - 01/Oct/18 3:32 PM

Thanks for the detailed benchmark info, Andreas, Ihara. (Sorry, I was not originally following the LU so did not see the earlier comment.)

Patrick Farrell (Inactive) added a comment - 01/Oct/18 3:32 PM Thanks for the detailed benchmark info, Andreas, Ihara. (Sorry, I was not originally following the LU so did not see the earlier comment.)

Shuichi Ihara added a comment - 01/Oct/18 12:10 AM

Here is performance resutls with patches.
4K ranodm read from 768 processes on 32 clients with fio below.

[randread]
ioengine=sync
rw=randread
blocksize=4096
iodepth=16
direct=1
size=4g
runtime=60
numjobs=24
group_reporting
directory=/cache1/fio.out
filename_format=f.$jobnum.$filenum

master without patch

All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct  1 09:00:53 2018
   read: IOPS=564k, BW=2205Mi (2312M)(129GiB/60006msec)
    clat (usec): min=210, max=228722, avg=1370.30, stdev=623.75
     lat (usec): min=210, max=228722, avg=1370.45, stdev=623.75
   bw (  KiB/s): min= 1192, max=11160, per=0.13%, avg=2911.36, stdev=1003.58, samples=91693
   iops        : min=  298, max= 2790, avg=727.83, stdev=250.90, samples=91693
  lat (usec)   : 250=0.01%, 500=3.09%, 750=4.03%, 1000=11.72%
  lat (msec)   : 2=71.92%, 4=9.13%, 10=0.06%, 20=0.05%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.19%, sys=1.70%, ctx=33884824, majf=0, minf=49349
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=33867532,0,0,0 short=0,0,0,0 dropped=0,0,0,0

master with patch 32832

All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct  1 00:50:54 2018
   read: IOPS=639k, BW=2497Mi (2619M)(146GiB/60062msec)
    clat (usec): min=202, max=303137, avg=1199.86, stdev=731.66
     lat (usec): min=202, max=303137, avg=1200.01, stdev=731.65
   bw (  KiB/s): min=  624, max=12712, per=0.13%, avg=3319.10, stdev=1131.37, samples=91719
   iops        : min=  156, max= 3178, avg=829.76, stdev=282.84, samples=91719
  lat (usec)   : 250=0.01%, 500=4.35%, 750=4.64%, 1000=21.49%
  lat (msec)   : 2=65.65%, 4=3.70%, 10=0.07%, 20=0.08%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=0.21%, sys=1.89%, ctx=38418553, majf=0, minf=54483
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=38398687,0,0,0 short=0,0,0,0 dropped=0,0,0,0

564k vs 639k IOPS. patch contributes 13% performance gain.

Shuichi Ihara added a comment - 01/Oct/18 12:10 AM Here is performance resutls with patches. 4K ranodm read from 768 processes on 32 clients with fio below. [randread] ioengine=sync rw=randread blocksize=4096 iodepth=16 direct=1 size=4g runtime=60 numjobs=24 group_reporting directory=/cache1/fio.out filename_format=f.$jobnum.$filenum master without patch All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct 1 09:00:53 2018 read: IOPS=564k, BW=2205Mi (2312M)(129GiB/60006msec) clat (usec): min=210, max=228722, avg=1370.30, stdev=623.75 lat (usec): min=210, max=228722, avg=1370.45, stdev=623.75 bw ( KiB/s): min= 1192, max=11160, per=0.13%, avg=2911.36, stdev=1003.58, samples=91693 iops : min= 298, max= 2790, avg=727.83, stdev=250.90, samples=91693 lat (usec) : 250=0.01%, 500=3.09%, 750=4.03%, 1000=11.72% lat (msec) : 2=71.92%, 4=9.13%, 10=0.06%, 20=0.05%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01% cpu : usr=0.19%, sys=1.70%, ctx=33884824, majf=0, minf=49349 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=33867532,0,0,0 short=0,0,0,0 dropped=0,0,0,0 master with patch 32832 All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct 1 00:50:54 2018 read: IOPS=639k, BW=2497Mi (2619M)(146GiB/60062msec) clat (usec): min=202, max=303137, avg=1199.86, stdev=731.66 lat (usec): min=202, max=303137, avg=1200.01, stdev=731.65 bw ( KiB/s): min= 624, max=12712, per=0.13%, avg=3319.10, stdev=1131.37, samples=91719 iops : min= 156, max= 3178, avg=829.76, stdev=282.84, samples=91719 lat (usec) : 250=0.01%, 500=4.35%, 750=4.64%, 1000=21.49% lat (msec) : 2=65.65%, 4=3.70%, 10=0.07%, 20=0.08%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01%, 500=0.01% cpu : usr=0.21%, sys=1.89%, ctx=38418553, majf=0, minf=54483 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=38398687,0,0,0 short=0,0,0,0 dropped=0,0,0,0 564k vs 639k IOPS. patch contributes 13% performance gain.

Andreas Dilger added a comment - 25/Sep/18 10:34 PM

Patrick,
attached is a flame graph ( ost_io.svg ) showing CPU usage for the OST under a high-throughput random read workload (fake IO used so that no storage overhead is present, just network and RPC processing). In ost_lvbo_update() the lu_env_init() and lu_env_fini() functions are consuming over 10% of the OSS CPU for basically no benefit. The master-patch-32832.svg flame graph shows the ost_lvbo_update() CPU usage is down to 1.5% when the patch is applied, which resulted in a 6.3% performance improvement for random 4KB reads. Alex, could you please include these results in the commit comment so that it is more clear why we want to land that patch.

The ost_io.svg graph is also showing find_or_create_page() using 4.25% of CPU, which drove the creation of patch https://review.whamcloud.com/32875 "LU-11347 osd: do not use pagecache for I/O".

Andreas Dilger added a comment - 25/Sep/18 10:34 PM Patrick, attached is a flame graph ( ost_io.svg ) showing CPU usage for the OST under a high-throughput random read workload (fake IO used so that no storage overhead is present, just network and RPC processing). In ost_lvbo_update() the lu_env_init() and lu_env_fini() functions are consuming over 10% of the OSS CPU for basically no benefit. The master-patch-32832.svg flame graph shows the ost_lvbo_update() CPU usage is down to 1.5% when the patch is applied, which resulted in a 6.3% performance improvement for random 4KB reads. Alex, could you please include these results in the commit comment so that it is more clear why we want to land that patch. The ost_io.svg graph is also showing find_or_create_page() using 4.25% of CPU, which drove the creation of patch https://review.whamcloud.com/32875 " LU-11347 osd: do not use pagecache for I/O ".

People

Assignee:: Alex Zhuravlev

Reporter:: Alex Zhuravlev

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 23/Jul/18 8:02 AM

Updated:: 01/Mar/19 4:01 PM

Resolved:: 06/Oct/18 1:23 PM