Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16691

optimize ldiskfs prealloc (PA) under random read workloads

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0, Lustre 2.15.2
    • 9223372036854775807

    Description

      In some cases, ldiskfs block allocation can consume a large amount of CPU cycles handling block allocations and cause OST threads to become blocked:

      crmd[16542]:  notice: High CPU load detected: 261.019989
      crmd[16542]:  notice: High CPU load detected: 258.720001
      crmd[16542]:  notice: High CPU load detected: 265.029999
      crmd[16542]:  notice: High CPU load detected: 270.309998
      
       INFO: task ll_ost00_027:20788 blocked for more than 90 seconds.
       ll_ost00_027    D ffff92242eda9080     0 20788      2 0x00000080
       Call Trace:
       schedule+0x29/0x70
       wait_transaction_locked+0x85/0xd0 [jbd2]
       add_transaction_credits+0x278/0x310 [jbd2]
       start_this_handle+0x1a1/0x430 [jbd2]
       jbd2__journal_start+0xf3/0x1f0 [jbd2]
       __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs]
       osd_trans_start+0x1e7/0x570 [osd_ldiskfs]
       ofd_trans_start+0x75/0xf0 [ofd]
       ofd_attr_set+0x586/0xb00 [ofd]
       ofd_setattr_hdl+0x31d/0x960 [ofd]
       tgt_request_handle+0xb7e/0x1700 [ptlrpc]
       ptlrpc_server_handle_request+0x253/0xbd0 [ptlrpc]
       ptlrpc_main+0xc09/0x1c30 [ptlrpc]
      

      Perf stats show that a large amount of CPU time is used in preallocation:

      Samples: 86M of event 'cycles', 4000 Hz, Event count (approx.): 25480688920 lost: 0/0 drop: 0/0
      Overhead  Shared Object               Symbol
        23,81%  [kernel]                    [k] _raw_qspin_lock
        21,90%  [kernel]                    [k] ldiskfs_mb_use_preallocated
        20,16%  [kernel]                    [k] __raw_callee_save___pv_queued_spin_unlock
        15,46%  [kernel]                    [k] ldiskfs_mb_normalize_request
         1,21%  [kernel]                    [k] rwsem_spin_on_owner
         0,98%  [kernel]                    [k] native_write_msr_safe
         0,54%  [kernel]                    [k] apic_timer_interrupt
         0,51%  [kernel]                    [k] ktime_get
      

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: