Details

    • 9223372036854775807

    Description

      The lvbo methods have to reallocate lu_env every time, which can be quite expensive in terms of CPU cycles at scale.
      The layers above can pass lu_env to reuse existing one.

      Attachments

        Issue Links

          Activity

            [LU-11164] lvbo_*() methods to reuse env

            yet another option would be to collect rpc_stats on the server side to see is it on the server or client side.

            bzzz Alex Zhuravlev added a comment - yet another option would be to collect rpc_stats on the server side to see is it on the server or client side.

            did you try to collect some data with say oprofile?

            bzzz Alex Zhuravlev added a comment - did you try to collect some data with say oprofile?
            panda Andrew Perepechko added a comment - - edited

            There's some profiling information collected during testing in automatic mode. I hope it can be useful.

            There's no significant difference in open(O_CREAT) processing time, but close() seems to have become slower:

            LU-11164:

             - open/close 5948 (time 1551093589.39 total 10.00 last 594.79)
             - open/close 10000 (time 1551093596.20 total 16.82 last 594.22)
             - open/close 15658 (time 1551093606.20 total 26.82 last 565.79)
            total: 20000 open/close in 34.55 seconds: 578.94 ops/second
            
            ll_atomic_open:22765503.0:20000:1138.0 ll_file_release:9271202.0:20000:463.0 ll_inode_revalidate:850446.0:40005:21.0 ll_file_open:326223.0:20000:16.0
            lmv_intent_lock:21263116.0:60005:354.0
            mdc_enqueue_base:20668025.0:20002:1033.0 mdc_close:8846957.0:20000:442.0 mdc_intent_lock:21006833.0:60005:350.0
            ldlm_cli_enqueue:19807126.0:20002:990.0 ldlm_handle_enqueue0:12404488.0:20002:620.0 ldlm_cli_enqueue_fini:870123.0:20002:43.0
            ptlrpc_queue_wait:27100485.0:40004:677.0 ptlrpc_server_handle_request:19790070.0:66468:297.0
            mdt_reint_open:11116636.0:20000:555.0 mdt_close:2967136.0:20000:148.0 mdt_intent_getattr:147.0:2:73.0
            mdd_create:5972754.0:20000:298.0 mdd_create_object:2055115.0:20000:102.0 mdd_trans_start:1641342.0:60000:27.0 mdd_trans_stop:1128474.0:60000:18.0 mdd_update_time:131643.0:20000:6.0 mdd_changelog_ns_store:11204.0:20000:0.0
            ldiskfs_create_inode:1199474.0:20000:59.0
             

            pre-LU-11164:

             - open/close 6245 (time 1551094193.65 total 10.00 last 624.45)
             - open/close 10000 (time 1551094199.67 total 16.02 last 623.75)
             - open/close 15804 (time 1551094209.68 total 26.02 last 580.35)
            total: 20000 open/close in 33.23 seconds: 601.83 ops/second
            
            ll_atomic_open:22203052.0:20000:1110.0 ll_file_release:8576370.0:20000:428.0 ll_inode_revalidate:823214.0:40005:20.0 ll_file_open:321143.0:20000:16.0
            lmv_intent_lock:20684255.0:60005:344.0
            mdc_enqueue_base:20090571.0:20002:1004.0 mdc_close:8149427.0:20000:407.0 mdc_intent_lock:20431136.0:60005:340.0
            ldlm_cli_enqueue:19175889.0:20002:958.0 ldlm_handle_enqueue0:12598163.0:20002:629.0 ldlm_cli_enqueue_fini:908190.0:20002:45.0
            ptlrpc_queue_wait:25720604.0:40004:642.0 ptlrpc_server_handle_request:19790636.0:41798:473.0
            mdt_reint_open:11314481.0:20000:565.0 mdt_close:2930872.0:20000:146.0 mdt_intent_getattr:157.0:2:78.0
            mdd_create:6212753.0:20000:310.0 mdd_create_object:2052026.0:20000:102.0 mdd_trans_start:1929037.0:60000:32.0 mdd_trans_stop:1121551.0:60000:18.0 mdd_update_time:130001.0:20000:6.0 mdd_changelog_ns_store:9368.0:20000:0.0
            ldiskfs_create_inode:1182995.0:20000:59.0
            

            open times look similar (1138.0 us vs 1110.0 us per operation, total 22.7 mln us vs 22.2 mln us), but close times have changed (463 us vs 428 us, 10% difference, total 9.2 mln us vs 8.5 mln us).

            The results stay similar for the neighbor commits.

            panda Andrew Perepechko added a comment - - edited There's some profiling information collected during testing in automatic mode. I hope it can be useful. There's no significant difference in open(O_CREAT) processing time, but close() seems to have become slower: LU-11164 : - open/close 5948 (time 1551093589.39 total 10.00 last 594.79) - open/close 10000 (time 1551093596.20 total 16.82 last 594.22) - open/close 15658 (time 1551093606.20 total 26.82 last 565.79) total: 20000 open/close in 34.55 seconds: 578.94 ops/second ll_atomic_open:22765503.0:20000:1138.0 ll_file_release:9271202.0:20000:463.0 ll_inode_revalidate:850446.0:40005:21.0 ll_file_open:326223.0:20000:16.0 lmv_intent_lock:21263116.0:60005:354.0 mdc_enqueue_base:20668025.0:20002:1033.0 mdc_close:8846957.0:20000:442.0 mdc_intent_lock:21006833.0:60005:350.0 ldlm_cli_enqueue:19807126.0:20002:990.0 ldlm_handle_enqueue0:12404488.0:20002:620.0 ldlm_cli_enqueue_fini:870123.0:20002:43.0 ptlrpc_queue_wait:27100485.0:40004:677.0 ptlrpc_server_handle_request:19790070.0:66468:297.0 mdt_reint_open:11116636.0:20000:555.0 mdt_close:2967136.0:20000:148.0 mdt_intent_getattr:147.0:2:73.0 mdd_create:5972754.0:20000:298.0 mdd_create_object:2055115.0:20000:102.0 mdd_trans_start:1641342.0:60000:27.0 mdd_trans_stop:1128474.0:60000:18.0 mdd_update_time:131643.0:20000:6.0 mdd_changelog_ns_store:11204.0:20000:0.0 ldiskfs_create_inode:1199474.0:20000:59.0 pre- LU-11164 : - open/close 6245 (time 1551094193.65 total 10.00 last 624.45) - open/close 10000 (time 1551094199.67 total 16.02 last 623.75) - open/close 15804 (time 1551094209.68 total 26.02 last 580.35) total: 20000 open/close in 33.23 seconds: 601.83 ops/second ll_atomic_open:22203052.0:20000:1110.0 ll_file_release:8576370.0:20000:428.0 ll_inode_revalidate:823214.0:40005:20.0 ll_file_open:321143.0:20000:16.0 lmv_intent_lock:20684255.0:60005:344.0 mdc_enqueue_base:20090571.0:20002:1004.0 mdc_close:8149427.0:20000:407.0 mdc_intent_lock:20431136.0:60005:340.0 ldlm_cli_enqueue:19175889.0:20002:958.0 ldlm_handle_enqueue0:12598163.0:20002:629.0 ldlm_cli_enqueue_fini:908190.0:20002:45.0 ptlrpc_queue_wait:25720604.0:40004:642.0 ptlrpc_server_handle_request:19790636.0:41798:473.0 mdt_reint_open:11314481.0:20000:565.0 mdt_close:2930872.0:20000:146.0 mdt_intent_getattr:157.0:2:78.0 mdd_create:6212753.0:20000:310.0 mdd_create_object:2052026.0:20000:102.0 mdd_trans_start:1929037.0:60000:32.0 mdd_trans_stop:1121551.0:60000:18.0 mdd_update_time:130001.0:20000:6.0 mdd_changelog_ns_store:9368.0:20000:0.0 ldiskfs_create_inode:1182995.0:20000:59.0 open times look similar (1138.0 us vs 1110.0 us per operation, total 22.7 mln us vs 22.2 mln us), but close times have changed (463 us vs 428 us, 10% difference, total 9.2 mln us vs 8.5 mln us). The results stay similar for the neighbor commits.

            any profiling information?

            bzzz Alex Zhuravlev added a comment - any profiling information?

            I'm seeing significant performance drop (-15%) in createmany -o between 2.11 and 2.12. Testing historical commits with a small step shows about 5% performance drop right after this patch.

            panda Andrew Perepechko added a comment - I'm seeing significant performance drop (-15%) in createmany -o between 2.11 and 2.12. Testing historical commits with a small step shows about 5% performance drop right after this patch.
            pjones Peter Jones added a comment -

            Landed for 2.12. James, perf testing is a standard part of release testing

            pjones Peter Jones added a comment - Landed for 2.12. James, perf testing is a standard part of release testing

            Before we close this has anyone measured metadata performance? I wonder if their was any impact form this work.

            simmonsja James A Simmons added a comment - Before we close this has anyone measured metadata performance? I wonder if their was any impact form this work.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32832/
            Subject: LU-11164 ldlm: pass env to lvbo methods
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e02cb40761ff8aae3df76c4210a345420b6d4ba1

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32832/ Subject: LU-11164 ldlm: pass env to lvbo methods Project: fs/lustre-release Branch: master Current Patch Set: Commit: e02cb40761ff8aae3df76c4210a345420b6d4ba1

            Thanks for the detailed benchmark info, Andreas, Ihara.  (Sorry, I was not originally following the LU so did not see the earlier comment.)

            paf Patrick Farrell (Inactive) added a comment - Thanks for the detailed benchmark info, Andreas, Ihara.  (Sorry, I was not originally following the LU so did not see the earlier comment.)

            Here is performance resutls with patches.
            4K ranodm read from 768 processes on 32 clients with fio below.

            [randread]
            ioengine=sync
            rw=randread
            blocksize=4096
            iodepth=16
            direct=1
            size=4g
            runtime=60
            numjobs=24
            group_reporting
            directory=/cache1/fio.out
            filename_format=f.$jobnum.$filenum
            

            master without patch

            All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct  1 09:00:53 2018
               read: IOPS=564k, BW=2205Mi (2312M)(129GiB/60006msec)
                clat (usec): min=210, max=228722, avg=1370.30, stdev=623.75
                 lat (usec): min=210, max=228722, avg=1370.45, stdev=623.75
               bw (  KiB/s): min= 1192, max=11160, per=0.13%, avg=2911.36, stdev=1003.58, samples=91693
               iops        : min=  298, max= 2790, avg=727.83, stdev=250.90, samples=91693
              lat (usec)   : 250=0.01%, 500=3.09%, 750=4.03%, 1000=11.72%
              lat (msec)   : 2=71.92%, 4=9.13%, 10=0.06%, 20=0.05%, 50=0.01%
              lat (msec)   : 100=0.01%, 250=0.01%
              cpu          : usr=0.19%, sys=1.70%, ctx=33884824, majf=0, minf=49349
              IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
                 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
                 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
                 issued rwts: total=33867532,0,0,0 short=0,0,0,0 dropped=0,0,0,0
            

            master with patch 32832

            All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct  1 00:50:54 2018
               read: IOPS=639k, BW=2497Mi (2619M)(146GiB/60062msec)
                clat (usec): min=202, max=303137, avg=1199.86, stdev=731.66
                 lat (usec): min=202, max=303137, avg=1200.01, stdev=731.65
               bw (  KiB/s): min=  624, max=12712, per=0.13%, avg=3319.10, stdev=1131.37, samples=91719
               iops        : min=  156, max= 3178, avg=829.76, stdev=282.84, samples=91719
              lat (usec)   : 250=0.01%, 500=4.35%, 750=4.64%, 1000=21.49%
              lat (msec)   : 2=65.65%, 4=3.70%, 10=0.07%, 20=0.08%, 50=0.01%
              lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
              cpu          : usr=0.21%, sys=1.89%, ctx=38418553, majf=0, minf=54483
              IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
                 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
                 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
                 issued rwts: total=38398687,0,0,0 short=0,0,0,0 dropped=0,0,0,0
            

            564k vs 639k IOPS. patch contributes 13% performance gain.

            sihara Shuichi Ihara added a comment - Here is performance resutls with patches. 4K ranodm read from 768 processes on 32 clients with fio below. [randread] ioengine=sync rw=randread blocksize=4096 iodepth=16 direct=1 size=4g runtime=60 numjobs=24 group_reporting directory=/cache1/fio.out filename_format=f.$jobnum.$filenum master without patch All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct 1 09:00:53 2018 read: IOPS=564k, BW=2205Mi (2312M)(129GiB/60006msec) clat (usec): min=210, max=228722, avg=1370.30, stdev=623.75 lat (usec): min=210, max=228722, avg=1370.45, stdev=623.75 bw ( KiB/s): min= 1192, max=11160, per=0.13%, avg=2911.36, stdev=1003.58, samples=91693 iops : min= 298, max= 2790, avg=727.83, stdev=250.90, samples=91693 lat (usec) : 250=0.01%, 500=3.09%, 750=4.03%, 1000=11.72% lat (msec) : 2=71.92%, 4=9.13%, 10=0.06%, 20=0.05%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01% cpu : usr=0.19%, sys=1.70%, ctx=33884824, majf=0, minf=49349 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=33867532,0,0,0 short=0,0,0,0 dropped=0,0,0,0 master with patch 32832 All clients: (groupid=0, jobs=32): err= 0: pid=0: Mon Oct 1 00:50:54 2018 read: IOPS=639k, BW=2497Mi (2619M)(146GiB/60062msec) clat (usec): min=202, max=303137, avg=1199.86, stdev=731.66 lat (usec): min=202, max=303137, avg=1200.01, stdev=731.65 bw ( KiB/s): min= 624, max=12712, per=0.13%, avg=3319.10, stdev=1131.37, samples=91719 iops : min= 156, max= 3178, avg=829.76, stdev=282.84, samples=91719 lat (usec) : 250=0.01%, 500=4.35%, 750=4.64%, 1000=21.49% lat (msec) : 2=65.65%, 4=3.70%, 10=0.07%, 20=0.08%, 50=0.01% lat (msec) : 100=0.01%, 250=0.01%, 500=0.01% cpu : usr=0.21%, sys=1.89%, ctx=38418553, majf=0, minf=54483 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=38398687,0,0,0 short=0,0,0,0 dropped=0,0,0,0 564k vs 639k IOPS. patch contributes 13% performance gain.

            Patrick,
            attached is a flame graph ( ost_io.svg ) showing CPU usage for the OST under a high-throughput random read workload (fake IO used so that no storage overhead is present, just network and RPC processing). In ost_lvbo_update() the lu_env_init() and lu_env_fini() functions are consuming over 10% of the OSS CPU for basically no benefit. The master-patch-32832.svg flame graph shows the ost_lvbo_update() CPU usage is down to 1.5% when the patch is applied, which resulted in a 6.3% performance improvement for random 4KB reads. Alex, could you please include these results in the commit comment so that it is more clear why we want to land that patch.

            The ost_io.svg graph is also showing find_or_create_page() using 4.25% of CPU, which drove the creation of patch https://review.whamcloud.com/32875 "LU-11347 osd: do not use pagecache for I/O".

            adilger Andreas Dilger added a comment - Patrick, attached is a flame graph ( ost_io.svg ) showing CPU usage for the OST under a high-throughput random read workload (fake IO used so that no storage overhead is present, just network and RPC processing). In ost_lvbo_update() the lu_env_init() and lu_env_fini() functions are consuming over 10% of the OSS CPU for basically no benefit. The master-patch-32832.svg flame graph shows the ost_lvbo_update() CPU usage is down to 1.5% when the patch is applied, which resulted in a 6.3% performance improvement for random 4KB reads. Alex, could you please include these results in the commit comment so that it is more clear why we want to land that patch. The ost_io.svg graph is also showing find_or_create_page() using 4.25% of CPU, which drove the creation of patch https://review.whamcloud.com/32875 " LU-11347 osd: do not use pagecache for I/O ".

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: