Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8927

osp-syn processes contending for osq_lock drives system cpu usage > 80%

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • None
    • lustre-2.8.0_5.chaos-2.ch6.x86_64
      zfs-0.7.0-0.6llnl.ch6.x86_64
      DNE with 16 MDTs
    • 3
    • 9223372036854775807

    Description

      Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes.

      There are 36 osp-syn-* processes.

      The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is:

      osq_lock
      __mutex_lock_slowpath
      mutex_lock
      spa_config_enter
      bp_get_dsize
      dmu_tx_hold_free
      osd_declare_object_destroy
      llog_osd_declare_destroy
      llog_declare_destroy
      llog_cancel_rec
      llog_cat_cancel_records
      osp_sync_process_committed
      osp_sync_process_queues
      llog_process_thread
      llog_process_or_fork
      llog_cat_process_cb
      llog_process_thread
      llog_process_or_fork
      llog_cat_process_or_fork
      llog_cat_process
      osp_sync_thread
      kthread
      ret_from_fork
      osp-syn-X-Y

      Attachments

        Issue Links

          Activity

            [LU-8927] osp-syn processes contending for osq_lock drives system cpu usage > 80%
            pjones Peter Jones made changes -
            Labels Original: llnl topllnl zfs New: llnl zfs
            ofaaland Olaf Faaland made changes -
            Resolution New: Won't Fix [ 2 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-10 [ JFC-10 ]
            mdiep Minh Diep made changes -
            Link New: This issue is related to JFC-10 [ JFC-10 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-8882 [ LU-8882 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-2435 [ LU-2435 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-8928 [ LU-8928 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-8873 [ LU-8873 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Alex Zhuravlev [ bzzz ]
            ofaaland Olaf Faaland made changes -
            Description Original: After running mdtest jobs which created remote directories (not striped) and then ran within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes.

            There are 36 osp-syn-* processes.

            The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is:

            osq_lock
            __mutex_lock_slowpath
            mutex_lock
            spa_config_enter
            bp_get_dsize
            dmu_tx_hold_free
            osd_declare_object_destroy
            llog_osd_declare_destroy
            llog_declare_destroy
            llog_cancel_rec
            llog_cat_cancel_records
            osp_sync_process_committed
            osp_sync_process_queues
            llog_process_thread
            llog_process_or_fork
            llog_cat_process_cb
            llog_process_thread
            llog_process_or_fork
            llog_cat_process_or_fork
            llog_cat_process
            osp_sync_thread
            kthread
            ret_from_fork
            osp-syn-X-Y


            New: Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes.

            There are 36 osp-syn-* processes.

            The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is:

            osq_lock
            __mutex_lock_slowpath
            mutex_lock
            spa_config_enter
            bp_get_dsize
            dmu_tx_hold_free
            osd_declare_object_destroy
            llog_osd_declare_destroy
            llog_declare_destroy
            llog_cancel_rec
            llog_cat_cancel_records
            osp_sync_process_committed
            osp_sync_process_queues
            llog_process_thread
            llog_process_or_fork
            llog_cat_process_cb
            llog_process_thread
            llog_process_or_fork
            llog_cat_process_or_fork
            llog_cat_process
            osp_sync_thread
            kthread
            ret_from_fork
            osp-syn-X-Y


            People

              bzzz Alex Zhuravlev
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: