Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      Lustre has server side QoS mechanism based on NRS TBF policy (LU-3558). NRS TBF policy is able to enforce rate limitations based on both NID rules and JOBID rules. However, when using JOBD-based TBF rules, if multiple jobs run on the same client, the RPC rates of those jobs will be affected by each other. More precisely, the job that has high RPC rate limitation might get slow RPC rate actually. The reason of that is, the job that has slower RPC rate limitations might exaust the max-in-flight-RPC-number limitation, or the max-cache-pages limitation.

      In order to prevent this from happening, a client side mechanism needs to be added to make the RPC sending chechanism at least more fair for all jobs.

      Attachments

        Activity

          [LU-7982] Client side QoS based on jobid

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19317/
          Subject: LU-7982 libcfs: memory allocation without CPT for binheap
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: cbe5b45a1d157c7345bd1352c257bee22ad8d085

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19317/ Subject: LU-7982 libcfs: memory allocation without CPT for binheap Project: fs/lustre-release Branch: master Current Patch Set: Commit: cbe5b45a1d157c7345bd1352c257bee22ad8d085

          With the updated version (patch set 2) of 19729, all busy job IDs will balance their page usages much much more quickly than
          before. And that makes me more confident with this design.

          lixi Li Xi (Inactive) added a comment - With the updated version (patch set 2) of 19729, all busy job IDs will balance their page usages much much more quickly than before. And that makes me more confident with this design.

          The patch 19729 tries to solve the same problem of 19700 in a different way. And it
          has much more complex design, maybe too complex. However, it is able to balance
          page cache usage between job IDs.

          If the page cache usage usage is balance from the first begining, it will remain balanced
          when all of thoese Job IDs has active I/Os:

          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297097
          job_id: "dd.0", used: 2731, max: 2731, idle time: 0
          job_id: "dd3.0", used: 2731, max: 2731, idle time: 0
          job_id: "dd2.0", used: 2730, max: 2730, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat parallel.sh 
          #!/bin/bash
          THREADS=1
          rm /mnt/lustre/* -f
          for THREAD in `seq $THREADS`; do
                  FILE1=/mnt/lustre/file1_$THREAD
                  FILE2=/mnt/lustre/file2_$THREAD
                  FILE3=/mnt/lustre/file3_$THREAD
                  dd if=/dev/zero of=$FILE1 bs=1048576 count=10000 &
                  dd2 if=/dev/zero of=$FILE2 bs=1048576 count=10000 &
                  dd3 if=/dev/zero of=$FILE3 bs=1048576 count=10000 &
          done
          [root@server9-Centos6-vm01 qos]# sh parallel.sh
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297229
          job_id: "dd.0", used: 0, max: 2731, idle time: 4297155
          job_id: "dd3.0", used: 0, max: 2731, idle time: 4297155
          job_id: "dd2.0", used: 0, max: 2730, idle time: 4297155
          

          And then, if only one job IDs is active, it will reclaim all the page caches to itself:

          [root@server9-Centos6-vm01 qos]# dd if=/dev/zero of=/mnt/lustre/file1 bs=1048576 count=10000
          ^C241+0 records in
          241+0 records out
          252706816 bytes (253 MB) copied, 6.35447 s, 39.8 MB/s
          
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8185, current time: 4297294
          job_id: "dd.0", used: 0, max: 2746, idle time: 4297292
          job_id: "dd3.0", used: 0, max: 2716, idle time: 4297292
          job_id: "dd2.0", used: 0, max: 2723, idle time: 4297290
          [root@server9-Centos6-vm01 qos]# dd if=/dev/zero of=/mnt/lustre/file1 bs=1048576 count=10000&
          [1] 2282
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297303
          job_id: "dd.0", used: 2777, max: 2777, idle time: 0
          job_id: "dd3.0", used: 0, max: 2700, idle time: 4297302
          job_id: "dd2.0", used: 0, max: 2715, idle time: 4297302
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297304
          job_id: "dd.0", used: 2777, max: 2777, idle time: 0
          job_id: "dd3.0", used: 0, max: 2700, idle time: 4297302
          job_id: "dd2.0", used: 0, max: 2715, idle time: 4297302
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297305
          job_id: "dd.0", used: 2825, max: 2825, idle time: 0
          job_id: "dd3.0", used: 0, max: 2668, idle time: 4297304
          job_id: "dd2.0", used: 0, max: 2699, idle time: 4297304
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297307
          job_id: "dd.0", used: 2921, max: 2921, idle time: 0
          job_id: "dd3.0", used: 0, max: 2604, idle time: 4297306
          job_id: "dd2.0", used: 0, max: 2667, idle time: 4297306
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297309
          job_id: "dd.0", used: 3113, max: 3113, idle time: 0
          job_id: "dd3.0", used: 0, max: 2476, idle time: 4297308
          job_id: "dd2.0", used: 0, max: 2603, idle time: 4297308
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297311
          job_id: "dd.0", used: 3497, max: 3497, idle time: 0
          job_id: "dd3.0", used: 0, max: 2220, idle time: 4297310
          job_id: "dd2.0", used: 0, max: 2475, idle time: 4297310
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297312
          job_id: "dd.0", used: 4265, max: 4265, idle time: 0
          job_id: "dd3.0", used: 0, max: 1708, idle time: 4297312
          job_id: "dd2.0", used: 0, max: 2219, idle time: 4297312
          [root@server9-Centos6-vm01 qos]# 
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297315
          job_id: "dd.0", used: 5801, max: 5801, idle time: 0
          job_id: "dd3.0", used: 0, max: 684, idle time: 4297314
          job_id: "dd2.0", used: 0, max: 1707, idle time: 4297314
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297316
          job_id: "dd.0", used: 5801, max: 5801, idle time: 0
          job_id: "dd3.0", used: 0, max: 684, idle time: 4297314
          job_id: "dd2.0", used: 0, max: 1707, idle time: 4297314
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297317
          job_id: "dd.0", used: 7509, max: 7509, idle time: 0
          job_id: "dd3.0", used: 0, max: 0, idle time: 0
          job_id: "dd2.0", used: 0, max: 683, idle time: 4297316
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297319
          job_id: "dd.0", used: 8192, max: 8192, idle time: 0
          job_id: "dd3.0", used: 0, max: 0, idle time: 0
          job_id: "dd2.0", used: 0, max: 0, idle time: 0
          

          And then if all Job IDs start I/O again, the page cache will again be balanced slowly again:

          [root@server9-Centos6-vm01 qos]# sh parallel.sh 
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297447
          job_id: "dd.0", used: 8063, max: 8063, idle time: 0
          job_id: "dd3.0", used: 65, max: 65, idle time: 0
          job_id: "dd2.0", used: 64, max: 64, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297454
          job_id: "dd.0", used: 7791, max: 7791, idle time: 0
          job_id: "dd3.0", used: 201, max: 201, idle time: 0
          job_id: "dd2.0", used: 200, max: 200, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297455
          job_id: "dd.0", used: 6400, max: 7728, idle time: 4297455
          job_id: "dd3.0", used: 232, max: 232, idle time: 0
          job_id: "dd2.0", used: 232, max: 232, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297474
          job_id: "dd.0", used: 7161, max: 7161, idle time: 0
          job_id: "dd3.0", used: 516, max: 516, idle time: 0
          job_id: "dd2.0", used: 515, max: 515, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297511
          job_id: "dd.0", used: 0, max: 6446, idle time: 4297503
          job_id: "dd3.0", used: 0, max: 872, idle time: 4297503
          job_id: "dd2.0", used: 0, max: 874, idle time: 4297504
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297566
          job_id: "dd.0", used: 5694, max: 5694, idle time: 0
          job_id: "dd3.0", used: 1249, max: 1249, idle time: 0
          job_id: "dd2.0", used: 1249, max: 1249, idle time: 0
          [root@server9-Centos6-vm01 qos]# sh parallel.sh 
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297601
          job_id: "dd.0", used: 5306, max: 5306, idle time: 0
          job_id: "dd3.0", used: 1442, max: 1442, idle time: 0
          job_id: "dd2.0", used: 1444, max: 1444, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297674
          job_id: "dd.0", used: 0, max: 4570, idle time: 4297652
          job_id: "dd3.0", used: 0, max: 1809, idle time: 4297653
          job_id: "dd2.0", used: 0, max: 1813, idle time: 4297653
          [root@server9-Centos6-vm01 qos]# sh parallel.sh 
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297771
          job_id: "dd.0", used: 0, max: 3751, idle time: 4297753
          job_id: "dd3.0", used: 0, max: 2221, idle time: 4297753
          job_id: "dd2.0", used: 0, max: 2220, idle time: 4297753
          [root@server9-Centos6-vm01 qos]# sh parallel.sh 
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297798
          job_id: "dd.0", used: 3719, max: 3719, idle time: 0
          job_id: "dd3.0", used: 2237, max: 2237, idle time: 0
          job_id: "dd2.0", used: 2236, max: 2236, idle time: 0
          [root@server9-Centos6-vm01 qos]# sh parallel.sh 
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297870
          job_id: "dd.0", used: 2886, max: 2886, idle time: 0
          job_id: "dd3.0", used: 2653, max: 2653, idle time: 0
          job_id: "dd2.0", used: 2653, max: 2653, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297885
          job_id: "dd.0", used: 2731, max: 2731, idle time: 0
          job_id: "dd3.0", used: 2731, max: 2731, idle time: 0
          job_id: "dd2.0", used: 2730, max: 2730, idle time: 0
          [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class 
          total: 8192, assigned: 8192, current time: 4297893
          job_id: "dd.0", used: 2730, max: 2730, idle time: 0
          job_id: "dd3.0", used: 2731, max: 2731, idle time: 0
          job_id: "dd2.0", used: 2731, max: 2731, idle time: 0
          

          As you can see, the balance process is very slow, because the busy Job ID will space one of its pages when one RPC finishes.
          This could be optimized in the future to speed up the balance process though.

          lixi Li Xi (Inactive) added a comment - The patch 19729 tries to solve the same problem of 19700 in a different way. And it has much more complex design, maybe too complex. However, it is able to balance page cache usage between job IDs. If the page cache usage usage is balance from the first begining, it will remain balanced when all of thoese Job IDs has active I/Os: [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297097 job_id: "dd.0", used: 2731, max: 2731, idle time: 0 job_id: "dd3.0", used: 2731, max: 2731, idle time: 0 job_id: "dd2.0", used: 2730, max: 2730, idle time: 0 [root@server9-Centos6-vm01 qos]# cat parallel.sh #!/bin/bash THREADS=1 rm /mnt/lustre/* -f for THREAD in `seq $THREADS`; do FILE1=/mnt/lustre/file1_$THREAD FILE2=/mnt/lustre/file2_$THREAD FILE3=/mnt/lustre/file3_$THREAD dd if=/dev/zero of=$FILE1 bs=1048576 count=10000 & dd2 if=/dev/zero of=$FILE2 bs=1048576 count=10000 & dd3 if=/dev/zero of=$FILE3 bs=1048576 count=10000 & done [root@server9-Centos6-vm01 qos]# sh parallel.sh [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297229 job_id: "dd.0", used: 0, max: 2731, idle time: 4297155 job_id: "dd3.0", used: 0, max: 2731, idle time: 4297155 job_id: "dd2.0", used: 0, max: 2730, idle time: 4297155 And then, if only one job IDs is active, it will reclaim all the page caches to itself: [root@server9-Centos6-vm01 qos]# dd if=/dev/zero of=/mnt/lustre/file1 bs=1048576 count=10000 ^C241+0 records in 241+0 records out 252706816 bytes (253 MB) copied, 6.35447 s, 39.8 MB/s [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8185, current time: 4297294 job_id: "dd.0", used: 0, max: 2746, idle time: 4297292 job_id: "dd3.0", used: 0, max: 2716, idle time: 4297292 job_id: "dd2.0", used: 0, max: 2723, idle time: 4297290 [root@server9-Centos6-vm01 qos]# dd if=/dev/zero of=/mnt/lustre/file1 bs=1048576 count=10000& [1] 2282 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297303 job_id: "dd.0", used: 2777, max: 2777, idle time: 0 job_id: "dd3.0", used: 0, max: 2700, idle time: 4297302 job_id: "dd2.0", used: 0, max: 2715, idle time: 4297302 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297304 job_id: "dd.0", used: 2777, max: 2777, idle time: 0 job_id: "dd3.0", used: 0, max: 2700, idle time: 4297302 job_id: "dd2.0", used: 0, max: 2715, idle time: 4297302 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297305 job_id: "dd.0", used: 2825, max: 2825, idle time: 0 job_id: "dd3.0", used: 0, max: 2668, idle time: 4297304 job_id: "dd2.0", used: 0, max: 2699, idle time: 4297304 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297307 job_id: "dd.0", used: 2921, max: 2921, idle time: 0 job_id: "dd3.0", used: 0, max: 2604, idle time: 4297306 job_id: "dd2.0", used: 0, max: 2667, idle time: 4297306 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297309 job_id: "dd.0", used: 3113, max: 3113, idle time: 0 job_id: "dd3.0", used: 0, max: 2476, idle time: 4297308 job_id: "dd2.0", used: 0, max: 2603, idle time: 4297308 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297311 job_id: "dd.0", used: 3497, max: 3497, idle time: 0 job_id: "dd3.0", used: 0, max: 2220, idle time: 4297310 job_id: "dd2.0", used: 0, max: 2475, idle time: 4297310 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297312 job_id: "dd.0", used: 4265, max: 4265, idle time: 0 job_id: "dd3.0", used: 0, max: 1708, idle time: 4297312 job_id: "dd2.0", used: 0, max: 2219, idle time: 4297312 [root@server9-Centos6-vm01 qos]# [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297315 job_id: "dd.0", used: 5801, max: 5801, idle time: 0 job_id: "dd3.0", used: 0, max: 684, idle time: 4297314 job_id: "dd2.0", used: 0, max: 1707, idle time: 4297314 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297316 job_id: "dd.0", used: 5801, max: 5801, idle time: 0 job_id: "dd3.0", used: 0, max: 684, idle time: 4297314 job_id: "dd2.0", used: 0, max: 1707, idle time: 4297314 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297317 job_id: "dd.0", used: 7509, max: 7509, idle time: 0 job_id: "dd3.0", used: 0, max: 0, idle time: 0 job_id: "dd2.0", used: 0, max: 683, idle time: 4297316 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297319 job_id: "dd.0", used: 8192, max: 8192, idle time: 0 job_id: "dd3.0", used: 0, max: 0, idle time: 0 job_id: "dd2.0", used: 0, max: 0, idle time: 0 And then if all Job IDs start I/O again, the page cache will again be balanced slowly again: [root@server9-Centos6-vm01 qos]# sh parallel.sh [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297447 job_id: "dd.0", used: 8063, max: 8063, idle time: 0 job_id: "dd3.0", used: 65, max: 65, idle time: 0 job_id: "dd2.0", used: 64, max: 64, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297454 job_id: "dd.0", used: 7791, max: 7791, idle time: 0 job_id: "dd3.0", used: 201, max: 201, idle time: 0 job_id: "dd2.0", used: 200, max: 200, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297455 job_id: "dd.0", used: 6400, max: 7728, idle time: 4297455 job_id: "dd3.0", used: 232, max: 232, idle time: 0 job_id: "dd2.0", used: 232, max: 232, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297474 job_id: "dd.0", used: 7161, max: 7161, idle time: 0 job_id: "dd3.0", used: 516, max: 516, idle time: 0 job_id: "dd2.0", used: 515, max: 515, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297511 job_id: "dd.0", used: 0, max: 6446, idle time: 4297503 job_id: "dd3.0", used: 0, max: 872, idle time: 4297503 job_id: "dd2.0", used: 0, max: 874, idle time: 4297504 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297566 job_id: "dd.0", used: 5694, max: 5694, idle time: 0 job_id: "dd3.0", used: 1249, max: 1249, idle time: 0 job_id: "dd2.0", used: 1249, max: 1249, idle time: 0 [root@server9-Centos6-vm01 qos]# sh parallel.sh [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297601 job_id: "dd.0", used: 5306, max: 5306, idle time: 0 job_id: "dd3.0", used: 1442, max: 1442, idle time: 0 job_id: "dd2.0", used: 1444, max: 1444, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297674 job_id: "dd.0", used: 0, max: 4570, idle time: 4297652 job_id: "dd3.0", used: 0, max: 1809, idle time: 4297653 job_id: "dd2.0", used: 0, max: 1813, idle time: 4297653 [root@server9-Centos6-vm01 qos]# sh parallel.sh [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297771 job_id: "dd.0", used: 0, max: 3751, idle time: 4297753 job_id: "dd3.0", used: 0, max: 2221, idle time: 4297753 job_id: "dd2.0", used: 0, max: 2220, idle time: 4297753 [root@server9-Centos6-vm01 qos]# sh parallel.sh [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297798 job_id: "dd.0", used: 3719, max: 3719, idle time: 0 job_id: "dd3.0", used: 2237, max: 2237, idle time: 0 job_id: "dd2.0", used: 2236, max: 2236, idle time: 0 [root@server9-Centos6-vm01 qos]# sh parallel.sh [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297870 job_id: "dd.0", used: 2886, max: 2886, idle time: 0 job_id: "dd3.0", used: 2653, max: 2653, idle time: 0 job_id: "dd2.0", used: 2653, max: 2653, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297885 job_id: "dd.0", used: 2731, max: 2731, idle time: 0 job_id: "dd3.0", used: 2731, max: 2731, idle time: 0 job_id: "dd2.0", used: 2730, max: 2730, idle time: 0 [root@server9-Centos6-vm01 qos]# cat /proc/fs/lustre/osc/lustre-OST0000-osc-ffff880109d20c00/osc_cache_class total: 8192, assigned: 8192, current time: 4297893 job_id: "dd.0", used: 2730, max: 2730, idle time: 0 job_id: "dd3.0", used: 2731, max: 2731, idle time: 0 job_id: "dd2.0", used: 2731, max: 2731, idle time: 0 As you can see, the balance process is very slow, because the busy Job ID will space one of its pages when one RPC finishes. This could be optimized in the future to speed up the balance process though.

          Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19729
          Subject: LU-7982 osc: qos support for page cache usage
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 795bfcfb5ee099f36e24e49ff24cca19c080bf8d

          gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19729 Subject: LU-7982 osc: qos support for page cache usage Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 795bfcfb5ee099f36e24e49ff24cca19c080bf8d

          The patch 19319 is trying to solve the problem of in-flight-RPC limitation.
          And the patch 19700 is trying to solve the problem of page cache limitation.

          Let's assume two processes with two different job IDs (Job1 and Job2) are
          writing to the same OST. And JobID based TBF rules are running on that OST. One
          of the two job ID has larger RPC rate R1, and the other has much smaller RPC
          rate R2. Both R1 and R2 are much smaller than the RPC rate that the OST can
          provide. The expected behavior of those two processes is: one of the processes
          has RPC rate of R1, while the other one has RPC rate of R2.

          However, the actual result is that both processes have RPC rate of R2 because
          of the in-flight-RPC limitation. Because R1 is much larger than R2, Job1's RPC
          finishes much quicker than Job2. And eventually, almost all of the in-flight-RPC
          of that OSC is the RPCs of Job2. And whenever Job1 wants to send an RPC, it
          needs to wait for the completion of Job2's RPC. The patch 19319 is trying to
          solve the problem.

          With that patch, since the in-flight-RPC is balanced between jobs, the behavior of
          some operations like direct I/O become the same as expected. However, the behavior
          of cached write still has problem. The reason is page cache limitation. Because
          Job1's RPC that flushes the dirty pages finishes much sooner than Job2, and
          also new idle pages are assigned to Job1 and Job2 randomly, eventually, all the
          pages are belongs to Job2. And thus, whenever Job1 wants to cache some pages,
          it needs to wait the completion of Job2's RPC.

          The patch 19700 is trying to solve this new problem. If both Job1 and Job2 are
          waiting for idle pages, it will assign the new idle page to the job ID that is
          currently using less pages. I haven't test it, it might work when there are
          mutiple processes for each job. However, if one job only has one process. there
          might still be problem.

          Let's assume all the pages are occupied. Most of the time, the processes of
          both Job1 and Job2 are sleeping waiting in the binary heap for more cache pages.
          If one page is released from an finished RPC, it will be assigned to Job1. If
          more than one pages are released from an finished RPC, they will be assigned to
          Job1 and then Job2. However, most of the time, a lot of pages are released in
          one RPC. And that is the problem of patch 19700...

          lixi Li Xi (Inactive) added a comment - The patch 19319 is trying to solve the problem of in-flight-RPC limitation. And the patch 19700 is trying to solve the problem of page cache limitation. Let's assume two processes with two different job IDs (Job1 and Job2) are writing to the same OST. And JobID based TBF rules are running on that OST. One of the two job ID has larger RPC rate R1, and the other has much smaller RPC rate R2. Both R1 and R2 are much smaller than the RPC rate that the OST can provide. The expected behavior of those two processes is: one of the processes has RPC rate of R1, while the other one has RPC rate of R2. However, the actual result is that both processes have RPC rate of R2 because of the in-flight-RPC limitation. Because R1 is much larger than R2, Job1's RPC finishes much quicker than Job2. And eventually, almost all of the in-flight-RPC of that OSC is the RPCs of Job2. And whenever Job1 wants to send an RPC, it needs to wait for the completion of Job2's RPC. The patch 19319 is trying to solve the problem. With that patch, since the in-flight-RPC is balanced between jobs, the behavior of some operations like direct I/O become the same as expected. However, the behavior of cached write still has problem. The reason is page cache limitation. Because Job1's RPC that flushes the dirty pages finishes much sooner than Job2, and also new idle pages are assigned to Job1 and Job2 randomly, eventually, all the pages are belongs to Job2. And thus, whenever Job1 wants to cache some pages, it needs to wait the completion of Job2's RPC. The patch 19700 is trying to solve this new problem. If both Job1 and Job2 are waiting for idle pages, it will assign the new idle page to the job ID that is currently using less pages. I haven't test it, it might work when there are mutiple processes for each job. However, if one job only has one process. there might still be problem. Let's assume all the pages are occupied. Most of the time, the processes of both Job1 and Job2 are sleeping waiting in the binary heap for more cache pages. If one page is released from an finished RPC, it will be assigned to Job1. If more than one pages are released from an finished RPC, they will be assigned to Job1 and then Job2. However, most of the time, a lot of pages are released in one RPC. And that is the problem of patch 19700...

          Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19700
          Subject: LU-7982 osc: qos support for page cache usage
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3d6fafd9336e6a235653c7469bf329906d40850c

          gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19700 Subject: LU-7982 osc: qos support for page cache usage Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3d6fafd9336e6a235653c7469bf329906d40850c
          pjones Peter Jones added a comment -

          Emoly

          Could you please review these patches?

          Thanks

          Peter

          pjones Peter Jones added a comment - Emoly Could you please review these patches? Thanks Peter

          Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19319
          Subject: LU-7982 nrs: Add client OSC side Qos support
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 780b81f5c1ca700c3a6f9d62024ba5b614ef62c7

          gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19319 Subject: LU-7982 nrs: Add client OSC side Qos support Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 780b81f5c1ca700c3a6f9d62024ba5b614ef62c7

          Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19317
          Subject: LU-7982 libcfs: memory allocation without CPT for binheap
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 48be3a4504794a7a315612abd7c7f501f8f75747

          gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/19317 Subject: LU-7982 libcfs: memory allocation without CPT for binheap Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 48be3a4504794a7a315612abd7c7f501f8f75747

          People

            lixi_wc Li Xi
            lixi Li Xi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: