[LU-15718] lnet_selftest performance issues. Created: 05/Apr/22  Updated: 06/Jun/22  Resolved: 06/Jun/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: Alexey Lyashkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

High cpu usage seen while running a lnet_selftest with 3k-4k message size.
based on perf top it cause a spin lock contention in some area - 60%-70% cpu time spent for spin lock.
Looking a lnet selftest code founds an global spin lock protected whole work - new match bits allocation, stats, etc..
let's replace this lock is with atomic counters. this is reduce a lock contention by two.
second half separated by lnet resource and



 Comments   
Comment by Colin Faber [ 08/Apr/22 ]

Thanks for the report shadow, are you working on a patch for this now?

Comment by Alexey Lyashkov [ 08/Apr/22 ]

Colinm yes - I working on patch now. I should be ready son.

Comment by Gerrit Updater [ 06/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47002/
Subject: LU-15718 lnet: improve lnet_selftest speed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dd5aa640781d8c6dcbbba46135e8b28b532a3840

Comment by Peter Jones [ 06/Jun/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:20:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.