Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3178

single stream write to striped file is slow with 2.X clients

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.4.0, Lustre 2.1.5
    • Four OSS blocks with 2 servers and a NEC SNA460 (AKA NetApp E5560) RAID with 8 OSTs each. CentOS 6.3 OS. Servers run Lustre 2.1.5. Testing with clients with Lustre 1.8.8, 2.1.5, 2.3.63.

    Description

      Single stream write to a striped file has poor performance on Lustre 2.* clients. Discussion started on the hpdd-discuss mailing list (Subject: striped write performance in lustre 2.X?).

      With Lustre 1.8.* (servers+clients) it was rather easy to exceed 1GB/s with a 4-8 striped file. With Lustre 2.1.5 servers again the Lustre 1.8.8 client is the fastest option (though not as fast as it used to be). Comparing Lustre 1.8.8 client run with a 2.3.63ish client including some tunings:

      1.8.8 client:

      # dd if=/dev/zero of=/mnt/lnec/pool4/stripe_4_4096/test256gbx bs=4M count=$((256/4*1000))
      268435456000 bytes (268 GB) copied, 349.921 s, 767 MB/s
      

      2.3.63 client:

      # dd if=/dev/zero of=/mnt/lnec/pool4/stripe_4_4096/test256gby bs=4M count=$((256/4*1000))
      268435456000 bytes (268 GB) copied, 689.333 s, 389 MB/s
      

      1.8.8 client rpc_stats (part):
      (typical OST rpcs in flight for full run)

                              read                    write
      rpcs in flight        rpcs   % cum % |       rpcs   % cum %
      0:                       0   0   0   |      11594  18  18
      1:                       0   0   0   |      13130  20  38
      2:                       0   0   0   |      16592  25  64
      3:                       0   0   0   |      16543  25  90
      4:                       0   0   0   |       3762   5  96
      5:                       0   0   0   |       1598   2  98
      6:                       0   0   0   |        338   0  99
      7:                       0   0   0   |        401   0  99
      8:                       0   0   0   |         45   0  99
      9:                       0   0   0   |          1   0 100
      
                              read                    write
      offset                rpcs   % cum % |       rpcs   % cum %
      0:                       0   0   0   |      64000  99  99
      1:                       0   0   0   |          0   0  99
      2:                       0   0   0   |          0   0  99
      4:                       0   0   0   |          0   0  99
      8:                       0   0   0   |          0   0  99
      16:                      0   0   0   |          0   0  99
      32:                      0   0   0   |          0   0  99
      64:                      0   0   0   |          0   0  99
      128:                     0   0   0   |          4   0 100
      
      1.8.8 client perf top:
                   2151.00  9.3% ll_ra_read_init [lustre]
                   1781.00  7.7% copy_user_generic_string [kernel.kallsyms]
                   1734.00  7.5% _spin_lock [kernel.kallsyms]
                   1300.00  5.6% lov_putref [lov]
                    781.00  3.4% radix_tree_tag_clear [kernel.kallsyms]
                    720.00  3.1% _spin_lock_irqsave [kernel.kallsyms]
                    667.00  2.9% intel_idle [kernel.kallsyms]
                    607.00  2.6% osc_destroy [osc]
                    554.00  2.4% __clear_user [kernel.kallsyms]
                    551.00  2.4% refresh_entry [lvfs]
                    451.00  2.0% kiblnd_ni_fini_pools [ko2iblnd]
                    399.00  1.7% radix_tree_tag_set [kernel.kallsyms]
                    356.00  1.5% __percpu_counter_add [kernel.kallsyms]
                    352.00  1.5% lov_queue_group_io [lov]
                    340.00  1.5% mark_page_accessed [kernel.kallsyms]
                    338.00  1.5% _spin_lock_irq [kernel.kallsyms]
                    307.00  1.3% test_clear_page_writeback [kernel.kallsyms]
                    306.00  1.3% lov_stripe_unlock [lov]
                    302.00  1.3% _spin_unlock_irqrestore [kernel.kallsyms]
                    280.00  1.2% osc_update_grant [osc]
                    272.00  1.2% __dec_zone_state [kernel.kallsyms]
                    221.00  1.0% lop_makes_rpc [osc]
      

      2.3.63 client rpc_stats
      (typical OST rpcs in flight stats from rpc_stats)

                              read                    write
      rpcs in flight        rpcs   % cum % |       rpcs   % cum %
      0:                       0   0   0   |          0   0   0
      1:                       0   0   0   |      16030  25  25
      2:                       0   0   0   |      16516  25  50
      3:                       0   0   0   |      23525  36  87
      4:                       0   0   0   |       7909  12  99
      5:                       0   0   0   |          7   0  99
      6:                       0   0   0   |          5   0  99
      7:                       0   0   0   |          4   0  99
      8:                       0   0   0   |          4   0 100
      
                              read                    write
      offset                rpcs   % cum % |       rpcs   % cum %
      0:                       0   0   0   |          1   0   0
      1:                       0   0   0   |          0   0   0
      2:                       0   0   0   |          0   0   0
      4:                       0   0   0   |          0   0   0
      8:                       0   0   0   |          0   0   0
      16:                      0   0   0   |          0   0   0
      32:                      0   0   0   |          0   0   0
      64:                      0   0   0   |          0   0   0
      128:                     0   0   0   |          0   0   0
      256:                     0   0   0   |          1   0   0
      512:                     0   0   0   |          2   0   0
      1024:                    0   0   0   |          4   0   0
      2048:                    0   0   0   |          8   0   0
      4096:                    0   0   0   |         16   0   0
      8192:                    0   0   0   |         32   0   0
      16384:                   0   0   0   |         64   0   0
      32768:                   0   0   0   |        128   0   0
      65536:                   0   0   0   |        256   0   0
      131072:                  0   0   0   |        512   0   1
      262144:                  0   0   0   |       1024   1   3
      524288:                  0   0   0   |       2048   3   6
      1048576:                         0   0   0   |       4096   6 12
      2097152:                         0   0   0   |       8192  12 25
      4194304:                         0   0   0   |      16384  25 51
      8388608:                         0   0   0   |      31232 48 100
      2.3.63 client perf top (up to 1%):
        5.03%  [kernel]            [k] copy_user_generic_string
        3.79%  [obdclass]          [k] key_fini
        3.78%  [kernel]            [k] _spin_lock
        2.43%  [obdclass]          [k] keys_fill
        2.13%  [kernel]            [k] kmem_cache_alloc
        1.99%  [kernel]            [k] __clear_user
        1.92%  [kernel]            [k] kmem_cache_free
        1.78%  [lvfs]              [k] lprocfs_counter_add
        1.48%  [kernel]            [k] kfree
        1.34%  [kernel]            [k] __kmalloc
        1.33%  [kernel]            [k] radix_tree_delete
        1.31%  [kernel]            [k] _spin_lock_irqsave
        1.28%  [obdclass]          [k] cl_page_put
        1.15%  [kernel]            [k] radix_tree_insert
        1.13%  [lvfs]              [k] lprocfs_counter_sub
        1.04%  [osc]               [k] osc_lru_del
        1.02%  [obdclass]          [k] cl_page_find0
        1.01%  [obdclass]          [k] lu_context_key_get
      

      Upon request from Jinshan I ran 'collectl -scml' during the dd:

      #### 2.3.63 client:
      [root@sabi22 ~]# collectl -scml -i 15
      waiting for 15 second sample...
      #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client-------->
      #cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map  KBRead  Reads  KBWrite Writes
         0   0    78    192  60G 112K   1G  43M 170M  80M       0      0        0      0
         2   2  2578  10780  53G 112K   7G   5G   1G  85M       0      0   392124     96
         5   5  4639  20295  39G 112K  17G  16G   4G  85M       0      0   741103    181
         4   4  4000  16842  31G 112K  24G  22G   6G  85M       0      0   609758    149
         4   4  3046  11499  31G 112K  24G  22G   6G  85M       0      0   411785    101
         4   4  3060  11592  31G 112K  24G  22G   6G  85M       0      0   412058    101
         4   4  3096  11445  31G 112K  24G  23G   6G  85M       0      0   405231     99
         4   4  3080  11440  31G 112K  24G  23G   6G  85M       0      0   404958     99
         4   4  3052  11386  30G 112K  24G  23G   6G  85M       0      0   405777     99
         4   4  3066  11477  30G 112K  24G  23G   6G  85M       0      0   407689    100
         4   4  3044  11552  30G 112K  25G  23G   6G  85M       0      0   412058    101
         4   4  3058  11382  30G 112K  25G  23G   6G  85M       0      0   403319     98
         4   4  3095  11412  30G 112K  25G  23G   6G  85M       0      0   404958     99
         4   4  3067  11370  30G 112K  25G  24G   6G  85M       0      0   403593     99
         4   4  3083  11478  29G 112K  25G  24G   6G  85M       0      0   407689    100
         4   4  3107  11374  29G 112K  25G  24G   6G  85M       0      0   401135     98
         4   4  3094  11280  29G 112K  25G  24G   6G  85M       0      0   396766     97
         4   4  3078  11282  29G 112K  25G  24G   6G  85M       0      0   398404     97
         4   4  3098  11241  29G 112K  26G  24G   6G  85M       0      0   396493     97
         4   4  3067  11238  28G 112K  26G  24G   6G  85M       0      0   393762     96
         4   4  3080  11513  28G 112K  26G  24G   6G  85M       0      0   395947     97
         4   4  3068  11327  28G 112K  26G  24G   6G  85M       0      0   394854     96
      #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client-------->
      #cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map  KBRead  Reads  KBWrite Writes
         4   4  3069  11444  28G 112K  26G  25G   6G  85M       0      0   396766     97
         4   4  3060  11337  28G 112K  26G  25G   7G  85M       0      0   392397     96
         4   4  3050  11220  28G 112K  26G  25G   7G  85M       0      0   388301     95
         4   4  3076  11355  28G 112K  26G  25G   7G  85M       0      0   394854     96
         4   4  3096  11359  27G 112K  27G  25G   7G  85M       0      0   394308     96
         4   4  3049  11316  27G 112K  27G  25G   7G  85M       0      0   394854     96
         4   4  3060  11325  27G 112K  27G  25G   7G  85M       0      0   396493     97
         4   4  3058  11381  27G 112K  27G  25G   7G  85M       0      0   396220     97
         4   4  3071  11343  27G 112K  27G  26G   7G  85M       0      0   395127     96
         4   4  3015  11213  27G 112K  27G  26G   7G  85M       0      0   390485     95
         4   4  3046  11328  26G 112K  27G  26G   7G  85M       0      0   394308     96
         4   4  3069  11454  26G 112K  27G  26G   7G  85M       0      0   397858     97
         4   4  3058  11524  26G 112K  28G  26G   7G  85M       0      0   399497     98
         3   3  3033  11427  26G 112K  28G  26G   7G  85M       0      0   397858     97
         3   3  3048  11500  26G 112K  28G  26G   7G  85M       0      0   402227     98
         4   4  3056  11291  26G 112K  28G  26G   7G  85M       0      0   395127     96
         4   4  3067  11223  25G 112K  28G  27G   7G  85M       0      0   394035     96
         3   3  3055  11366  25G 112K  28G  27G   7G  85M       0      0   398404     97
         4   4  3036  11337  25G 112K  28G  27G   7G  85M       0      0   397585     97
         4   4  3057  11307  25G 112K  28G  27G   7G  85M       0      0   395947     97
         4   4  3080  11241  25G 112K  29G  27G   7G  85M       0      0   393489     96
         1   1  1392   5003  25G 112K  29G  27G   7G  80M       0      0   170667     42
      #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client-------->
      #cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map  KBRead  Reads  KBWrite Writes
         0   0   101    216  25G 112K  29G  27G   7G  80M       0      0        0      0
      

      Doesn't seem to fill the memory...

      Further details:

      [root@sabi22 ~]# lctl get_param llite.*.max_cached_mb
      llite.lnec-ffff8808348d2c00.max_cached_mb=
      users: 32
      max_cached_mb: 48397
      used_mb: 24610
      unused_mb: 23787
      reclaim_count: 0
      
      [root@sabi22 ~]# lctl get_param osc.*.max_dirty_mb
      osc.lnec-OST0000-osc-ffff8808348d2c00.max_dirty_mb=32
      [...]
      osc.lnec-OST0018-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST0019-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST001a-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST001b-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST001c-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST001d-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST001e-osc-ffff8808348d2c00.max_dirty_mb=32
      osc.lnec-OST001f-osc-ffff8808348d2c00.max_dirty_mb=32
      
      [root@sabi22 EF]# lfs getstripe /mnt/lnec/pool4/stripe_4_4096/test256a
      /mnt/lnec/pool4/stripe_4_4096/test256a
      lmm_stripe_count:   4
      lmm_stripe_size:    4194304
      lmm_layout_gen:     0
      lmm_stripe_offset:  27
      lmm_pool:           osb4
              obdidx           objid           objid           group
                  27            6792         0x1a88                0
                  31            5891         0x1703                0
                  24            6983         0x1b47                0
                  28            5251         0x1483                0
      

      Attachments

        1. opreport_algdw__dd_2.3.63
          3.97 MB
        2. collectl-dd-0f1d647
          6 kB
        3. oprofile-dd-0f1d647
          3.91 MB
        4. collectl-dd-7bcf31e
          3 kB
        5. oprofile-dd-7bcf31e
          3.21 MB

        Activity

          People

            jay Jinshan Xiong (Inactive)
            efocht Erich Focht
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: