Details
Description
Ran jobs which created remote directories (not striped) and then ran mdtest within them, several MDS nodes are using >80% of their cpu time for osp-syn-* processes.
There are 36 osp-syn-* processes.
The processes are spending almost all their time contending for osq_lock. According to perf, the offending stack is:
osq_lock
__mutex_lock_slowpath
mutex_lock
spa_config_enter
bp_get_dsize
dmu_tx_hold_free
osd_declare_object_destroy
llog_osd_declare_destroy
llog_declare_destroy
llog_cancel_rec
llog_cat_cancel_records
osp_sync_process_committed
osp_sync_process_queues
llog_process_thread
llog_process_or_fork
llog_cat_process_cb
llog_process_thread
llog_process_or_fork
llog_cat_process_or_fork
llog_cat_process
osp_sync_thread
kthread
ret_from_fork
osp-syn-X-Y
This lock contention has not resulted in problems in production, and there is so much related change in 2.10 and master that it's quite possible the problem does not occur there. Closing the ticket.