Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13801

Enable io_uring interface for Lustre client

Details

    • Improvement
    • Resolution: Done
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Kernels since 5.1 have implemented the io_uring interface (https://kernel.dk/io_uring.pdf) for efficient asynchronous IO submission to storage. According to posted results, io_uring is on-par with SPDK doing all of the IO in userspace. The io_uring interface is intended to replace the older libaio interface.

      With the recent performance improvements for libaio AIO/DIO, it should be possible to use the io_uring interfaces in a similar manner.

      I don't think many applications are using this interface yet, but since it provides a significant improvement over libaio it will likely become used in performance-oriented applications.

      Attachments

        Issue Links

          Activity

            [LU-13801] Enable io_uring interface for Lustre client

            I think the important thing to note is that libaio is only used by a small number of applications, and has actively been discouraged by the kernel developers. I think the goal for the future is that io_uring is supposed to be a widely supported as a non-POSIX IO interface for the kernel.

            Luckily, it appears that the libaio optimizations for Lustre also benefit io_uring, so it may be that we don't have a lot of work to do in this area.

            adilger Andreas Dilger added a comment - I think the important thing to note is that libaio is only used by a small number of applications, and has actively been discouraged by the kernel developers. I think the goal for the future is that io_uring is supposed to be a widely supported as a non-POSIX IO interface for the kernel. Luckily, it appears that the libaio optimizations for Lustre also benefit io_uring, so it may be that we don't have a lot of work to do in this area.
            wshilong Wang Shilong (Inactive) added a comment - - edited

            I did a quickly test of io_uring on local NVME device and ext4 filesystem:

            This is to NVME device directly:

                     LIBAIO          io_uring        io_uring + poll
            IOPS:    342656          377382          580258
            

            We could see iops burst very well with same fio command as Ihara's above fio.

            However, if we compared results on ext4:

                     LIBAIO          io_uring        io_uring + poll
            IOPS:    312950          258104          260594
            

            It looks io_uring did not help on filesystem, it makes me think io_uring might not help on Lustre too, it might be only helpful in
            some cases that target device is really low latency, however network filesystem might not be the case?

            wshilong Wang Shilong (Inactive) added a comment - - edited I did a quickly test of io_uring on local NVME device and ext4 filesystem: This is to NVME device directly: LIBAIO io_uring io_uring + poll IOPS: 342656 377382 580258 We could see iops burst very well with same fio command as Ihara's above fio. However, if we compared results on ext4: LIBAIO io_uring io_uring + poll IOPS: 312950 258104 260594 It looks io_uring did not help on filesystem, it makes me think io_uring might not help on Lustre too, it might be only helpful in some cases that target device is really low latency, however network filesystem might not be the case?

            sihara One of important stuff we missed for io_uring test is: we might need -sqthread_poll=1 this might make big difference for io uring testing.

            wshilong Wang Shilong (Inactive) added a comment - sihara One of important stuff we missed for io_uring test is: we might need -sqthread_poll=1 this might make big difference for io uring testing.

            I’d be curious to know if they help - 4K random read is a workload im

             not sure about.  It probably will not help that too much...?  But I’m not sure.

            paf0186 Patrick Farrell added a comment - I’d be curious to know if they help - 4K random read is a workload im  not sure about.  It probably will not help that too much...?  But I’m not sure.

            I guess we could get better performances with Patrick's optimized DIO patches(notice even AIO they shared some code path)

            wshilong Wang Shilong (Inactive) added a comment - I guess we could get better performances with Patrick's optimized DIO patches(notice even AIO they shared some code path)
            sihara Shuichi Ihara added a comment - - edited

            I got first test resutls and comparison of libaio and ut_ring with fio.
            Tested Configuration

            • AI400 (20 x Samsung NVMe) for OST/MDT
            • 1 x Client(2 x Platinum 8160, 192GB RAM, 2 x IB-EDR)
              • Ubuntu 20.04 (5.4.0-42-generic)
              • Lustre master branch (commit f384a8733c)

            Test workload (1 thread, QD=1 to 256, 4K random read)

            #!/bin/sh
            
            for api in libaio io_uring; do
            	for qd in 1 2 4 8 16 32 64 128 256; do
            		./fio -name=randread -ioengine=${api} -rw=randread -blocksize=4096 -iodepth=$qd -direct=1 -runtime=10 -group_reporting=1 -create_serialize=0 -size=8g -numjobs=1 -directory=/ai400/testdir -filename_format='f.$jobnum.$filenum'
            	done
            done
            

            Here is results.

            QD  libaio io_uring
             1    4.2     4.4
             2    8.5     8.4
             4   16.7    16.8
             8   31.5    31.4
            16   43.8    43.5
            32   60.1    60.1
            64   90.7    93.3
            128  96.1   103.0
            256  95.1   100.0
            

            The good news, at least io_uring didn't break Lustre , but I didn't see huge performance benefit single thread standpoint (with many QDs case). I will play a bit and more collect more results.

            sihara Shuichi Ihara added a comment - - edited I got first test resutls and comparison of libaio and ut_ring with fio. Tested Configuration AI400 (20 x Samsung NVMe) for OST/MDT 1 x Client(2 x Platinum 8160, 192GB RAM, 2 x IB-EDR) Ubuntu 20.04 (5.4.0-42-generic) Lustre master branch (commit f384a8733c) Test workload (1 thread, QD=1 to 256, 4K random read) #!/bin/sh for api in libaio io_uring; do for qd in 1 2 4 8 16 32 64 128 256; do ./fio -name=randread -ioengine=${api} -rw=randread -blocksize=4096 -iodepth=$qd -direct=1 -runtime=10 -group_reporting=1 -create_serialize=0 -size=8g -numjobs=1 -directory=/ai400/testdir -filename_format='f.$jobnum.$filenum' done done Here is results. QD libaio io_uring 1 4.2 4.4 2 8.5 8.4 4 16.7 16.8 8 31.5 31.4 16 43.8 43.5 32 60.1 60.1 64 90.7 93.3 128 96.1 103.0 256 95.1 100.0 The good news, at least io_uring didn't break Lustre , but I didn't see huge performance benefit single thread standpoint (with many QDs case). I will play a bit and more collect more results.

            It mounts  Last LWG call I talked to Peter about getting a Ubuntu 20 LTS client VM up and running. I hope to see this soon. Especially now we are seeing work like this coming down the pipeline.

            simmonsja James A Simmons added a comment - It mounts  Last LWG call I talked to Peter about getting a Ubuntu 20 LTS client VM up and running. I hope to see this soon. Especially now we are seeing work like this coming down the pipeline.

            Does it work, James?   And does it perform?

            paf0186 Patrick Farrell added a comment - Does it work, James?   And does it perform?

            As Andreas pointed out you can fire up a Ubuntu 20.04 LTS with Lustre clients to easily use this functionality.

            simmonsja James A Simmons added a comment - As Andreas pointed out you can fire up a Ubuntu 20.04 LTS with Lustre clients to easily use this functionality.

            According to patch https://review.whamcloud.com/39231 "LU-13740 build: update changelog for ubuntu kernel" the master client is able to compile against kernels up to 5.4 at least, so it should be possible to start testing this.

            adilger Andreas Dilger added a comment - According to patch https://review.whamcloud.com/39231 " LU-13740 build: update changelog for ubuntu kernel " the master client is able to compile against kernels up to 5.4 at least, so it should be possible to start testing this.

            FIO has supported ioengine=io_uring since 2019, i think we could did a quickly verify this with newer kernel etc 5.1 if Lustre client could compile well.

            wshilong Wang Shilong (Inactive) added a comment - FIO has supported ioengine=io_uring since 2019, i think we could did a quickly verify this with newer kernel etc 5.1 if Lustre client could compile well.

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: