Details
Description
Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes.
There are 36 osp-syn-* processes.
The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is:
osq_lock
__mutex_lock_slowpath
mutex_lock
spa_config_enter
bp_get_dsize
dmu_tx_hold_free
osd_declare_object_destroy
llog_osd_declare_destroy
llog_declare_destroy
llog_cancel_rec
llog_cat_cancel_records
osp_sync_process_committed
osp_sync_process_queues
llog_process_thread
llog_process_or_fork
llog_cat_process_cb
llog_process_thread
llog_process_or_fork
llog_cat_process_or_fork
llog_cat_process
osp_sync_thread
kthread
ret_from_fork
osp-syn-X-Y