Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13393

t10crc4K/512 algorithm in rhel8.1 kernel is slower than rhel7.7

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • master, rhel8.1 (4.18.0-147.el8.x86_64)
    • 3
    • 9223372036854775807

    Description

      t10crc4K/512 algorithm in rhel8.1 kernel is slower than rhel7.7

      The performance with T10PI checksum algorithm of t10crc4K/512 in rhel8.1 kernel is broken.
      If client is running with rhel8.1 kernel and enabled t10crc4K/512 checksum, that client performance is much slower than rhel7.7 kernel with enabling same t10crc4K/512 checksum.
      Here is test configuration and results.

      Configuration

      1 x client
      1 x Platinum 8160, 96GB memory, 1 x IB-EDR
      (lctl set_param osc.*.max_pages_per_rpc=16M osc.*.max_rpcs_in_flight=16 osc.*.max_dirty_mb=512 llite.*.max_read_ahead_mb=2048 osc.*.checksum_type=t10crc4K)
      

      Test resutl on RHEL7.7 (3.10.0-1062.el7.x86_64)

      PPN=1
      mpirun  --allow-run-as-root -np 1 ior -w -r -t 1m -b 256g -e -F -o /testfs/s/file
      Max Write: 1981.81 MiB/sec (2078.07 MB/sec)
      Max Read:  2685.01 MiB/sec (2815.44 MB/sec)
      
      PPN=16
      mpirun  --allow-run-as-root -np 16 ior -w -r -t 1m -b 16g -e -F -o /testfs/file
      Max Write: 9887.55 MiB/sec (10367.84 MB/sec)
      Max Read:  11212.37 MiB/sec (11757.03 MB/sec)
      

      Test resutl on RHEL8.1 (4.18.0-147.el8.x86_64)

      PPN=1
      mpirun  --allow-run-as-root -np 1 ior -w -r -t 1m -b 256g -e -F -o /testfs/s/file
      Max Write: 1703.20 MiB/sec (1785.94 MB/sec)
      Max Read:  758.24 MiB/sec (795.07 MB/sec)
      
      PPN=16
      mpirun  --allow-run-as-root -np 16 ior -w -r -t 1m -b 16g -e -F -o /testfs/file
      Max Write: 6741.36 MiB/sec (7068.83 MB/sec)
      Max Read:  5821.17 MiB/sec (6103.94 MB/sec)
      

      Even algorithm performance test indicated t10crc4K/512 algorithm in rhel8.1 is slow against rhel7.7 kernel. (30x slower.)

      RHEL7.7 (3.10.0-1062.el7.x86_64)

      obd_t10_performance_test() T10 checksum algorithm t10ip512 speed = 13015 MB/s
      obd_t10_performance_test() T10 checksum algorithm t10ip4K speed = 16855 MB/s
      obd_t10_performance_test() T10 checksum algorithm t10crc512 speed = 2551 MB/s
      obd_t10_performance_test() T10 checksum algorithm t10crc4K speed = 9231 MB/s
      

      RHEL8.1 (4.18.0-147.el8.x86_64)

      obd_t10_performance_test() T10 checksum algorithm t10ip512 speed = 13395 MB/s
      obd_t10_performance_test() T10 checksum algorithm t10ip4K speed = 19267 MB/s
      obd_t10_performance_test() T10 checksum algorithm t10crc512 speed = 339 MB/s
      obd_t10_performance_test() T10 checksum algorithm t10crc4K speed = 342 MB/s
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: