Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13973

4K random write performance impacts on large sparse files

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.14.0
    • None
    • master
    • 3
    • 9223372036854775807

    Description

      Here is a tested workload.

      4k, random write, FPP(File per process)

      [randwrite]
      ioengine=libaio
      rw=randwrite
      blocksize=4k
      iodepth=4
      direct=1
      size=${SIZE}
      runtime=60
      numjobs=16
      group_reporting
      directory=/ai400x/out
      create_serialize=0
      filename_format=f.$jobnum.$filenum
      

      The test case is that 2 clients have each 16 fio processes and each fio process does 4k random write to different files.
      However, if file size is large (128GB in this case), it causes the huge performance impacts. Here is two test results.

      1GB file

      # SIZE=1g /work/ihara/fio.git/fio --client=hostfile randomwrite.fio
      
      write: IOPS=16.8k, BW=65.5MiB/s (68.7MB/s)(3930MiB/60004msec); 0 zone resets
       

      128GB file

      # SIZE=128g /work/ihara/fio.git/fio --client=hostfile randomwrite.fio
      
      write: IOPS=2894, BW=11.3MiB/s (11.9MB/s)(679MiB/60039msec)
       

      As far as I observed those two cases and collected cpu profiles on OSS, in 128GB file case, there were big spinlocks in ldiskfs_mb_new_block() and ldiskfs_mb_normalized_request() and it spent 89% time (14085/15823 samples) of total ost_io_xx() against 20% (1895/9296 samples) in 1GB file case. Please see attached framegraph.

      Attachments

        Issue Links

          Activity

            [LU-13973] 4K random write performance impacts on large sparse files

            In fact, it seems that fallocate is not working in both patch (patchset6 and patchet7) properly..

            patchset 6

            [root@ec01 ~]# time  fallocate -l 128g /ai400x/test1
            
            real	0m0.004s
            user	0m0.001s
            sys	0m0.000s
            [root@ec01 ~]# ls -l /ai400x/test1 
            -rw-r--r-- 1 root root 0 Sep 21 14:47 /ai400x/test1
            

            patchset 7

            [root@ec01 ~]# time  fallocate -l 128g /ai400x/test1
            
            real	0m0.003s
            user	0m0.001s
            sys	0m0.000s
            [root@ec01 ~]# ls -l /ai400x/test1 
            -rw-r--r-- 1 root root 0 Sep 21 15:06 /ai400x/test1
            
            sihara Shuichi Ihara added a comment - In fact, it seems that fallocate is not working in both patch (patchset6 and patchet7) properly.. patchset 6 [root@ec01 ~]# time fallocate -l 128g /ai400x/test1 real 0m0.004s user 0m0.001s sys 0m0.000s [root@ec01 ~]# ls -l /ai400x/test1 -rw-r--r-- 1 root root 0 Sep 21 14:47 /ai400x/test1 patchset 7 [root@ec01 ~]# time fallocate -l 128g /ai400x/test1 real 0m0.003s user 0m0.001s sys 0m0.000s [root@ec01 ~]# ls -l /ai400x/test1 -rw-r--r-- 1 root root 0 Sep 21 15:06 /ai400x/test1
            qian_wc Qian Yingjin added a comment - - edited

            Btw, could you please measure the fallocate performance with/without the updated patches?

            i.e.

            time fallocate -l 128G test1
            time fallocate -l 256G test2
            I just want to known whether it will affect the fallocate using time.

            thanks,
            Qian

            qian_wc Qian Yingjin added a comment - - edited Btw, could you please measure the fallocate performance with/without the updated patches? i.e. time fallocate -l 128G test1 time fallocate -l 256G test2 I just want to known whether it will affect the fallocate using time. thanks, Qian
            qian_wc Qian Yingjin added a comment -

            Please try the updated fallocate patch:
            https://review.whamcloud.com/39342 LU-13765 osd-ldiskfs: Extend credit correctly for fallocate

            It jus modified one line:

            diff --git a/lustre/osd-ldiskfs/osd_io.c b/lustre/osd-ldiskfs/osd_io.c
            index 462a462cc9..689471e8a3 100644
            --- a/lustre/osd-ldiskfs/osd_io.c
            +++ b/lustre/osd-ldiskfs/osd_io.c
            @@ -2009,7 +2009,7 @@ static int osd_fallocate(const struct lu_env *env, struct dt_object *dt,
                                    break;
             
                            rc = ldiskfs_map_blocks(handle, inode, &map,
            -                                       LDISKFS_GET_BLOCKS_CREATE_UNWRIT_EXT);
            +                                       LDISKFS_GET_BLOCKS_CREATE);
                            if (rc <= 0) {
                                    CDEBUG(D_INODE, "inode #%lu: block %u: len %u: "
                                           "ldiskfs_map_blocks returned %d\n",
            
            

            Regards,
            Qian

            qian_wc Qian Yingjin added a comment - Please try the updated fallocate patch: https://review.whamcloud.com/39342 LU-13765 osd-ldiskfs: Extend credit correctly for fallocate It jus modified one line: diff --git a/lustre/osd-ldiskfs/osd_io.c b/lustre/osd-ldiskfs/osd_io.c index 462a462cc9..689471e8a3 100644 --- a/lustre/osd-ldiskfs/osd_io.c +++ b/lustre/osd-ldiskfs/osd_io.c @@ -2009,7 +2009,7 @@ static int osd_fallocate( const struct lu_env *env, struct dt_object *dt, break ; rc = ldiskfs_map_blocks(handle, inode, &map, - LDISKFS_GET_BLOCKS_CREATE_UNWRIT_EXT); + LDISKFS_GET_BLOCKS_CREATE); if (rc <= 0) { CDEBUG(D_INODE, "inode #%lu: block %u: len %u: " "ldiskfs_map_blocks returned %d\n" , Regards, Qian
            qian_wc Qian Yingjin added a comment -

            Hi Ihara,

            I may find the reason, it should be a problem of fallocate for direct IO (not for buffered IO).

            Will make a revised patch soon.

            Regards,
            Qian

            qian_wc Qian Yingjin added a comment - Hi Ihara, I may find the reason, it should be a problem of fallocate for direct IO (not for buffered IO). Will make a revised patch soon. Regards, Qian
            sihara Shuichi Ihara added a comment - - edited

            Yingjin, I also thought fallocate might help and tried fallocate with fio (NOTE, fio use fallocate if filesystem supports it) after patch https://review.whamcloud.com/#/c/39342/ applied, but it was same problem and fallocate didn't help neither. btw, overwriting files helped. e.g. create 128GB files and allocate all blocks first then randomwrite on them.

             

            sihara Shuichi Ihara added a comment - - edited Yingjin, I also thought fallocate might help and tried fallocate with fio (NOTE, fio use fallocate if filesystem supports it) after patch https://review.whamcloud.com/#/c/39342/ applied, but it was same problem and fallocate didn't help neither. btw, overwriting files helped. e.g. create 128GB files and allocate all blocks first then randomwrite on them.  
            qian_wc Qian Yingjin added a comment -

            Hi Ihara,

            Could you please first preallocate all space via fallocate?
            i.e
            fio with fallocate,
            or use the command 'fallocate -l ' to preallocate all needed space,
            and then do the fio testing?

            Thanks,
            Qian

            qian_wc Qian Yingjin added a comment - Hi Ihara, Could you please first preallocate all space via fallocate? i.e fio with fallocate, or use the command 'fallocate -l ' to preallocate all needed space, and then do the fio testing? Thanks, Qian

            People

              qian_wc Qian Yingjin
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: