Details
Description
Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes.
There are 36 osp-syn-* processes.
The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is:
osq_lock
__mutex_lock_slowpath
mutex_lock
spa_config_enter
bp_get_dsize
dmu_tx_hold_free
osd_declare_object_destroy
llog_osd_declare_destroy
llog_declare_destroy
llog_cancel_rec
llog_cat_cancel_records
osp_sync_process_committed
osp_sync_process_queues
llog_process_thread
llog_process_or_fork
llog_cat_process_cb
llog_process_thread
llog_process_or_fork
llog_cat_process_or_fork
llog_cat_process
osp_sync_thread
kthread
ret_from_fork
osp-syn-X-Y
Attachments
Issue Links
Activity
Labels | Original: llnl topllnl zfs | New: llnl zfs |
Resolution | New: Won't Fix [ 2 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Link | Original: This issue is related to JFC-10 [ JFC-10 ] |
Link | New: This issue is related to JFC-10 [ JFC-10 ] |
Assignee | Original: WC Triage [ wc-triage ] | New: Alex Zhuravlev [ bzzz ] |
Description |
Original:
After running mdtest jobs which created remote directories (not striped) and then ran within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes. There are 36 osp-syn-* processes. The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is: osq_lock __mutex_lock_slowpath mutex_lock spa_config_enter bp_get_dsize dmu_tx_hold_free osd_declare_object_destroy llog_osd_declare_destroy llog_declare_destroy llog_cancel_rec llog_cat_cancel_records osp_sync_process_committed osp_sync_process_queues llog_process_thread llog_process_or_fork llog_cat_process_cb llog_process_thread llog_process_or_fork llog_cat_process_or_fork llog_cat_process osp_sync_thread kthread ret_from_fork osp-syn-X-Y |
New:
Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes. There are 36 osp-syn-* processes. The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is: osq_lock __mutex_lock_slowpath mutex_lock spa_config_enter bp_get_dsize dmu_tx_hold_free osd_declare_object_destroy llog_osd_declare_destroy llog_declare_destroy llog_cancel_rec llog_cat_cancel_records osp_sync_process_committed osp_sync_process_queues llog_process_thread llog_process_or_fork llog_cat_process_cb llog_process_thread llog_process_or_fork llog_cat_process_or_fork llog_cat_process osp_sync_thread kthread ret_from_fork osp-syn-X-Y |