Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11618

implement ladvise rpc_size for optimized performance

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Originally, the maximum RPC size for bulk I/O can be controlled by
      per-OSC parameter max_pages_per_rpc. Whenever it is possible, the
      OSC will do bulk I/O as large as max_pages_per_rpc for better
      performance. Thus, changing the value of max_pages_per_rpc usually
      affects the I/O performance a lot. However, due to the I/O pattern
      difference, not all applications can get the best performance with
      the same value of max_pages_per_rpc.

      We want to add a new type of ladvise to enabling applications to
      set different RPC sizes for different files. max_pages_per_rpc is
      still the upper limit of the RPC size in all cases. And new
      parameter default_pages_per_rpc has been added and its value is
      the default RPC size. If a ladvise of rpc_size is given to a file,
      the RPC size of the file will be changed according to the ladvise.
      But the maximum RPC size will still limited by max_pages_per_rpc.

      The RPC size of a file configured by ladivse is neither a global
      attribute nor a persistent attribute. Each client may have
      different RPC size for the same file. And the RPC size of the file
      will change back to default_pages_per_rpc when the hint kept in
      memory is lost due to memory shrinkage.

      Attachments

        Issue Links

          Activity

            [LU-11618] implement ladvise rpc_size for optimized performance
            lixi_wc Li Xi added a comment - - edited

            Following is how to use it:

            MGS
            $ lctl conf_param lipe1-OST*.obdfilter.brw_size=16
            $ lctl conf_param lipe1-OST*.osc.max_pages_per_rpc=16M

            Client
            $ mount -t lustre 10.0.1.148@tcp:10.0.1.149@tcp:/lipe1 /mnt/lustre_lipe1
            $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/default_pages_per_rpc
            4096
            $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/max_pages_per_rpc
            4096
            $ lfs setstripe -c 1 -i 0 /mnt/lustre_lipe1/file
            $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=10
            $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            ...
            read write
            pages per rpc rpcs % cum % | rpcs % cum %
            1: 0 0 0 | 0 0 0
            2: 0 0 0 | 0 0 0
            4: 0 0 0 | 0 0 0
            8: 0 0 0 | 0 0 0
            16: 0 0 0 | 0 0 0
            32: 0 0 0 | 0 0 0
            64: 0 0 0 | 0 0 0
            128: 0 0 0 | 0 0 0
            256: 0 0 0 | 0 0 0
            512: 0 0 0 | 0 0 0
            1024: 0 0 0 | 0 0 0
            2048: 0 0 0 | 0 0 0
            4096: 0 0 0 | 1 100 100
            ...
            $ echo > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            $ echo 256 > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/default_pages_per_rpc
            $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=10
            $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            ...
            read write
            pages per rpc rpcs % cum % | rpcs % cum %
            1: 0 0 0 | 0 0 0
            2: 0 0 0 | 0 0 0
            4: 0 0 0 | 0 0 0
            8: 0 0 0 | 0 0 0
            16: 0 0 0 | 0 0 0
            32: 0 0 0 | 0 0 0
            64: 0 0 0 | 0 0 0
            128: 0 0 0 | 0 0 0
            256: 0 0 0 | 10 100 100
            ...
            $ echo > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            $ lfs ladvise -a rpcsize -r 16M /mnt/lustre_lipe1/file
            $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=16
            $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            ...
            read write
            pages per rpc rpcs % cum % | rpcs % cum %
            1: 0 0 0 | 0 0 0
            2: 0 0 0 | 0 0 0
            4: 0 0 0 | 0 0 0
            8: 0 0 0 | 0 0 0
            16: 0 0 0 | 0 0 0
            32: 0 0 0 | 0 0 0
            64: 0 0 0 | 0 0 0
            128: 0 0 0 | 0 0 0
            256: 0 0 0 | 0 0 0
            512: 0 0 0 | 0 0 0
            1024: 0 0 0 | 0 0 0
            2048: 0 0 0 | 0 0 0
            4096: 0 0 0 | 1 100 100
            ...
            $ echo > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            $ lfs ladvise -a rpcsize -r 4M /mnt/lustre_lipe1/file
            $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=16
            $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats
            ...
            read write
            pages per rpc rpcs % cum % | rpcs % cum %
            1: 0 0 0 | 0 0 0
            2: 0 0 0 | 0 0 0
            4: 0 0 0 | 0 0 0
            8: 0 0 0 | 0 0 0
            16: 0 0 0 | 0 0 0
            32: 0 0 0 | 0 0 0
            64: 0 0 0 | 0 0 0
            128: 0 0 0 | 0 0 0
            256: 0 0 0 | 0 0 0
            512: 0 0 0 | 0 0 0
            1024: 0 0 0 | 4 100 100
            ...

            lixi_wc Li Xi added a comment - - edited Following is how to use it: MGS $ lctl conf_param lipe1-OST*.obdfilter.brw_size=16 $ lctl conf_param lipe1-OST*.osc.max_pages_per_rpc=16M Client $ mount -t lustre 10.0.1.148@tcp:10.0.1.149@tcp:/lipe1 /mnt/lustre_lipe1 $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/default_pages_per_rpc 4096 $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/max_pages_per_rpc 4096 $ lfs setstripe -c 1 -i 0 /mnt/lustre_lipe1/file $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=10 $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats ... read write pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 0 0 0 512: 0 0 0 | 0 0 0 1024: 0 0 0 | 0 0 0 2048: 0 0 0 | 0 0 0 4096: 0 0 0 | 1 100 100 ... $ echo > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats $ echo 256 > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/default_pages_per_rpc $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=10 $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats ... read write pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 10 100 100 ... $ echo > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats $ lfs ladvise -a rpcsize -r 16M /mnt/lustre_lipe1/file $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=16 $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats ... read write pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 0 0 0 512: 0 0 0 | 0 0 0 1024: 0 0 0 | 0 0 0 2048: 0 0 0 | 0 0 0 4096: 0 0 0 | 1 100 100 ... $ echo > /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats $ lfs ladvise -a rpcsize -r 4M /mnt/lustre_lipe1/file $ dd if=/dev/zero of=/mnt/lustre_lipe1/file bs=1048576 count=16 $ cat /proc/fs/lustre/osc/lipe1-OST0000-osc-*/rpc_stats ... read write pages per rpc rpcs % cum % | rpcs % cum % 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 0 0 0 512: 0 0 0 | 0 0 0 1024: 0 0 0 | 4 100 100 ...
            lixi_wc Li Xi added a comment -

            Thanks Andreas. Has this mechanism of stripe-based-RPC-size being changed? I can't find any related codes now.

            We are thinking of using ladvise hint through MPI ROMIO like LU-6179. That would require some arguments when calling MPI run command, but doesn't require modification of the appliation. And we are tired of tuning the global RPC size parameter to get good performance.

             

            lixi_wc Li Xi added a comment - Thanks Andreas. Has this mechanism of stripe-based-RPC-size being changed? I can't find any related codes now. We are thinking of using ladvise hint through MPI ROMIO like LU-6179 . That would require some arguments when calling MPI run command, but doesn't require modification of the appliation. And we are tired of tuning the global RPC size parameter to get good performance.  

            In the past, the RPC size was also affected by the stripe size, so that applications could specify this on a per-file basis. Also, storing the stripe_size in the file is persistent, and does not require the application to be modified.

            adilger Andreas Dilger added a comment - In the past, the RPC size was also affected by the stripe size, so that applications could specify this on a per-file basis. Also, storing the stripe_size in the file is persistent, and does not require the application to be modified.

            Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/33573
            Subject: LU-11618 osc: implement ladvise rpc_size
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ce387d26f369dec24496eaed4f9213b76bb3f47f

            gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/33573 Subject: LU-11618 osc: implement ladvise rpc_size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ce387d26f369dec24496eaed4f9213b76bb3f47f

            People

              lixi_wc Li Xi
              lixi_wc Li Xi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: