Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5603

Enable inline_data feature for Lustre

Details

    • New Feature
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 15672

    Description

      Now, we found ldiskfs(Ext4) directory creation is much slower than file creation.

      Under RHEL7 with ldiskfs, we got following results.

      1. ./mdtest -d /mnt/test/ -n 100000 -i 3

      Directory creation: 52299.122 ops/second
      File creation: 106569 ops/second

      As we can see, directory creation performance is much slower than file creation, after doing some profiling, we found cost differences come from following calls:

      ->ext4_mkdir()
      ->ext4_init_new_dir()
      ->ext4_append()
      ->ext4_bread()
      ->ext4_getblk()
      ->ext4_map_blocks()
      So here an extra block allocation for '.' and '..' items will cost extra time, and we enable inline_data which will reduce block allocation for such case, things got better:

      Directory creation: 111276.454 ops/second
      File creation: 114920.338 ops/second

      As we can see, with inline_data enabled, directory creation performance is same as file creation.

      We also found inline_data feature not only help directory creation but also help
      directory reading, consider following case:

      1. creating 100W directories under test directory and then run 'time ls -R ./test'

      time cost reduce from 2m25s to 30s, huge differences! this is because inline_data will reduce an extra block allocation which also speed read performance as we don't have to do an extra block IO.

      In generally, Inline_data could improve performance and reduce space allocation which is also good.

      As now, inline_data is include in RHEL7 and there seems some conflicts with Lustre dirdata feature, but inline_data deserve us an eye, adding it to lustre will give us some improvement thought we still confirm it under lustre.

      Attachments

        Issue Links

          Activity

            [LU-5603] Enable inline_data feature for Lustre

            Stephane, yes the incompatibility with upstream is known to me. That is why we haven't enabled this feature yet. It will need some ldiskfs development effort to allow these features to work together.

            adilger Andreas Dilger added a comment - Stephane, yes the incompatibility with upstream is known to me. That is why we haven't enabled this feature yet. It will need some ldiskfs development effort to allow these features to work together.

            As far as I know, the dirdata feature is still not available in ext4 and is currently maintained as an out-of-tree feature for Lustre via an ldiskfs patch named (ext4-data-in-dirent.patch). This is leading to possible incompatibilities when trying to enable newer ext4 features like inline_data with Lustre. I just tried to enable it on EL 9.2 using mkfs.lustre -O inline_data ... and the underlying mke2fs command returns "The dirdata feature can not enabled with inline data feature.". If I try to format with -O ^dirdata,inline_data, the MDT complaints about missing dirdata and then crashes (kernel BUG) in ldiskfs (this is with master from a couple weeks).

            Is there any solution in sight so Lustre could benefit from inline_data? It seems it would give some performance benefits (maybe not for the IO-500 benchmarks but more likely for real life use cases...) and could also potentially greatly reduce the blocks consumed by small directories. Our own interest for inline_data is not performance-related (although it's always welcome)... but rather in storing a very large amount of very small directories in Lustre so that it can be used more efficiently as a MinIO disk.

            sthiell Stephane Thiell added a comment - As far as I know, the dirdata feature is still not available in ext4 and is currently maintained as an out-of-tree feature for Lustre via an ldiskfs patch named ( ext4-data-in-dirent.patch ). This is leading to possible incompatibilities when trying to enable newer ext4 features like inline_data with Lustre. I just tried to enable it on EL 9.2 using mkfs.lustre -O inline_data ... and the underlying mke2fs command returns "The dirdata feature can not enabled with inline data feature.". If I try to format with -O ^dirdata,inline_data , the MDT complaints about missing dirdata and then crashes (kernel BUG) in ldiskfs (this is with master from a couple weeks). Is there any solution in sight so Lustre could benefit from inline_data? It seems it would give some performance benefits (maybe not for the IO-500 benchmarks but more likely for real life use cases...) and could also potentially greatly reduce the blocks consumed by small directories. Our own interest for  inline_data is not performance-related (although it's always welcome)... but rather in storing a very large amount of very small directories in Lustre so that it can be used more efficiently as a MinIO disk.

            The inline_data feature will also help in the case of agent directory inodes created for remote/striped directories. Instead of allocating a separate directory block to hold only "." and "..", it would be possible to store the ".." entry + parent FID into the directory inode itself. There is no need to store a "." entry, since this is stored in the inode itself and can be generated from inode->i_ino and the self FID stored in trusted.lma.

            adilger Andreas Dilger added a comment - The inline_data feature will also help in the case of agent directory inodes created for remote/striped directories. Instead of allocating a separate directory block to hold only " . " and " .. ", it would be possible to store the " .. " entry + parent FID into the directory inode itself. There is no need to store a " . " entry, since this is stored in the inode itself and can be generated from inode->i_ino and the self FID stored in trusted.lma .
            adilger Andreas Dilger added a comment - - edited

            This feature will unfortunaately not help with the IO-500 mdtest-hard-write and mdtest-hard-read, since the data cannot quite fit within the MDT inode even if it was formatted with a 4KB inode size. The mdtest-hard-write parameters create 3901-byte files, along with 160 bytes of the core inode and about 256 bytes of other xattrs are too large to fit even into a 4096-byte inode, if that were in use (which it is not).

            However, for the default 1024-byte MDT inode size, this would still improve performance for files below approximately 600 bytes in size, or directories with fewer than about 15 average-length (32-byte) filenames. There are definitely some workloads (e.g. OpenFoam, at least in some uses) that may benefit significantly from faster small/empty directory creation speed, and some workloads have a lot of small files.

            adilger Andreas Dilger added a comment - - edited This feature will unfortunaately not help with the IO-500 mdtest-hard-write and mdtest-hard-read , since the data cannot quite fit within the MDT inode even if it was formatted with a 4KB inode size. The mdtest-hard-write parameters create 3901-byte files, along with 160 bytes of the core inode and about 256 bytes of other xattrs are too large to fit even into a 4096-byte inode, if that were in use (which it is not). However, for the default 1024-byte MDT inode size, this would still improve performance for files below approximately 600 bytes in size, or directories with fewer than about 15 average-length (32-byte) filenames. There are definitely some workloads (e.g. OpenFoam, at least in some uses) that may benefit significantly from faster small/empty directory creation speed, and some workloads have a lot of small files.
            ihara Shuichi Ihara (Inactive) added a comment - - edited

            Peter,
            We would like to work on this. There is a metadata performance (directory creation and removal) limit. In order to confirm, we did some fundamental testing and confirmed "inline_data" siginicant improves the performance of directory metadata operations with ext4. And, it (or simiar way) is one of good candidates to break through today's lustre direcotry operation's performance.
            Howerver, still some discussions are required to move forward since there is no compatibility of "inline_data" and "dirdata".
            First of all, we would like to test on Lustre with "inline_data" or keeping "dirdata" + similar way of "inline_data" is doing, then make sure this idea could work with Lustre as well.

            ihara Shuichi Ihara (Inactive) added a comment - - edited Peter, We would like to work on this. There is a metadata performance (directory creation and removal) limit. In order to confirm, we did some fundamental testing and confirmed "inline_data" siginicant improves the performance of directory metadata operations with ext4. And, it (or simiar way) is one of good candidates to break through today's lustre direcotry operation's performance. Howerver, still some discussions are required to move forward since there is no compatibility of "inline_data" and "dirdata". First of all, we would like to test on Lustre with "inline_data" or keeping "dirdata" + similar way of "inline_data" is doing, then make sure this idea could work with Lustre as well.
            pjones Peter Jones added a comment -

            Hi there

            Is this an issue that you are planning to work on yourself or are you reporting it in the hope that somebody else will implement your suggestion?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hi there Is this an issue that you are planning to work on yourself or are you reporting it in the hope that somebody else will implement your suggestion? Thanks Peter

            People

              dongyang Dongyang Li
              wangshilong Wang Shilong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: