Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13801

Enable io_uring interface for Lustre client

Details

    • Improvement
    • Resolution: Done
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Kernels since 5.1 have implemented the io_uring interface (https://kernel.dk/io_uring.pdf) for efficient asynchronous IO submission to storage. According to posted results, io_uring is on-par with SPDK doing all of the IO in userspace. The io_uring interface is intended to replace the older libaio interface.

      With the recent performance improvements for libaio AIO/DIO, it should be possible to use the io_uring interfaces in a similar manner.

      I don't think many applications are using this interface yet, but since it provides a significant improvement over libaio it will likely become used in performance-oriented applications.

      Attachments

        Issue Links

          Activity

            [LU-13801] Enable io_uring interface for Lustre client

            Closing this ticket as it looks like we don't need to do anything to enable io_uring for Lustre.

            adilger Andreas Dilger added a comment - Closing this ticket as it looks like we don't need to do anything to enable io_uring for Lustre.

            I think the important thing to note is that libaio is only used by a small number of applications, and has actively been discouraged by the kernel developers. I think the goal for the future is that io_uring is supposed to be a widely supported as a non-POSIX IO interface for the kernel.

            Luckily, it appears that the libaio optimizations for Lustre also benefit io_uring, so it may be that we don't have a lot of work to do in this area.

            adilger Andreas Dilger added a comment - I think the important thing to note is that libaio is only used by a small number of applications, and has actively been discouraged by the kernel developers. I think the goal for the future is that io_uring is supposed to be a widely supported as a non-POSIX IO interface for the kernel. Luckily, it appears that the libaio optimizations for Lustre also benefit io_uring, so it may be that we don't have a lot of work to do in this area.
            wshilong Wang Shilong (Inactive) added a comment - - edited

            I did a quickly test of io_uring on local NVME device and ext4 filesystem:

            This is to NVME device directly:

                     LIBAIO          io_uring        io_uring + poll
            IOPS:    342656          377382          580258
            

            We could see iops burst very well with same fio command as Ihara's above fio.

            However, if we compared results on ext4:

                     LIBAIO          io_uring        io_uring + poll
            IOPS:    312950          258104          260594
            

            It looks io_uring did not help on filesystem, it makes me think io_uring might not help on Lustre too, it might be only helpful in
            some cases that target device is really low latency, however network filesystem might not be the case?

            wshilong Wang Shilong (Inactive) added a comment - - edited I did a quickly test of io_uring on local NVME device and ext4 filesystem: This is to NVME device directly: LIBAIO io_uring io_uring + poll IOPS: 342656 377382 580258 We could see iops burst very well with same fio command as Ihara's above fio. However, if we compared results on ext4: LIBAIO io_uring io_uring + poll IOPS: 312950 258104 260594 It looks io_uring did not help on filesystem, it makes me think io_uring might not help on Lustre too, it might be only helpful in some cases that target device is really low latency, however network filesystem might not be the case?

            sihara One of important stuff we missed for io_uring test is: we might need -sqthread_poll=1 this might make big difference for io uring testing.

            wshilong Wang Shilong (Inactive) added a comment - sihara One of important stuff we missed for io_uring test is: we might need -sqthread_poll=1 this might make big difference for io uring testing.

            I’d be curious to know if they help - 4K random read is a workload im

             not sure about.  It probably will not help that too much...?  But I’m not sure.

            paf0186 Patrick Farrell added a comment - I’d be curious to know if they help - 4K random read is a workload im  not sure about.  It probably will not help that too much...?  But I’m not sure.

            I guess we could get better performances with Patrick's optimized DIO patches(notice even AIO they shared some code path)

            wshilong Wang Shilong (Inactive) added a comment - I guess we could get better performances with Patrick's optimized DIO patches(notice even AIO they shared some code path)
            sihara Shuichi Ihara added a comment - - edited

            I got first test resutls and comparison of libaio and ut_ring with fio.
            Tested Configuration

            • AI400 (20 x Samsung NVMe) for OST/MDT
            • 1 x Client(2 x Platinum 8160, 192GB RAM, 2 x IB-EDR)
              • Ubuntu 20.04 (5.4.0-42-generic)
              • Lustre master branch (commit f384a8733c)

            Test workload (1 thread, QD=1 to 256, 4K random read)

            #!/bin/sh
            
            for api in libaio io_uring; do
            	for qd in 1 2 4 8 16 32 64 128 256; do
            		./fio -name=randread -ioengine=${api} -rw=randread -blocksize=4096 -iodepth=$qd -direct=1 -runtime=10 -group_reporting=1 -create_serialize=0 -size=8g -numjobs=1 -directory=/ai400/testdir -filename_format='f.$jobnum.$filenum'
            	done
            done
            

            Here is results.

            QD  libaio io_uring
             1    4.2     4.4
             2    8.5     8.4
             4   16.7    16.8
             8   31.5    31.4
            16   43.8    43.5
            32   60.1    60.1
            64   90.7    93.3
            128  96.1   103.0
            256  95.1   100.0
            

            The good news, at least io_uring didn't break Lustre , but I didn't see huge performance benefit single thread standpoint (with many QDs case). I will play a bit and more collect more results.

            sihara Shuichi Ihara added a comment - - edited I got first test resutls and comparison of libaio and ut_ring with fio. Tested Configuration AI400 (20 x Samsung NVMe) for OST/MDT 1 x Client(2 x Platinum 8160, 192GB RAM, 2 x IB-EDR) Ubuntu 20.04 (5.4.0-42-generic) Lustre master branch (commit f384a8733c) Test workload (1 thread, QD=1 to 256, 4K random read) #!/bin/sh for api in libaio io_uring; do for qd in 1 2 4 8 16 32 64 128 256; do ./fio -name=randread -ioengine=${api} -rw=randread -blocksize=4096 -iodepth=$qd -direct=1 -runtime=10 -group_reporting=1 -create_serialize=0 -size=8g -numjobs=1 -directory=/ai400/testdir -filename_format='f.$jobnum.$filenum' done done Here is results. QD libaio io_uring 1 4.2 4.4 2 8.5 8.4 4 16.7 16.8 8 31.5 31.4 16 43.8 43.5 32 60.1 60.1 64 90.7 93.3 128 96.1 103.0 256 95.1 100.0 The good news, at least io_uring didn't break Lustre , but I didn't see huge performance benefit single thread standpoint (with many QDs case). I will play a bit and more collect more results.

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: