Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.1.0
-
None
-
3
-
6428
Description
The load average on the MDS for a classified production 2.1 filesystem jumped to over 400. Top showed mdt_rdpg_* threads all using 4-7% CPU time. This may have been due to a pathological workload, but we were wondering if there's something like an overly contended lock in ldiskfs going on here.
Most of the stacks looked like this:
__cond_resched
_cond_resched
ifind_fast
iget_locked
ldiskfs_iget
? generic_detach_inode
osd_iget
osd_ea_fid_get
osd_it_ea_rec
mdd_readpage
cml_readpage
mdt_readpage
? mdt_unpack_req_pack_rep
mdt_handle_common
? lustre_msg_get_transno
mdt_readpage_handle
ptlrpc_main
child_rip