Improve LNet Statistics (LU-14040)

[LU-14041] LNet: CPT Statistics Created: 15/Oct/20  Updated: 24/Feb/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Technical task Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Cyril Bordage
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Whenever a CPT lock is taken and some list is traversed, keep the max time the CPT lock is taken. Soft lockups have been observed on several sites. It took a long investigation to understand the reason for the lockups. Keeping statistics on the max length of time a CPT remains locked will allow us to identify these issues quicker.

Track the load on the different CPTs. This will improve our ability to fine tune CPT configuration.



 Comments   
Comment by Gerrit Updater [ 20/Oct/20 ]

[DELETED]

Comment by Gerrit Updater [ 24/Feb/21 ]

Cyril Bordage (cbordage@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41746
Subject: LU-14041 lnet: add max CPT lock duration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 76032155795515e8e07a6ca9668d8ccea52beff6

Comment by Gerrit Updater [ 24/Feb/21 ]

Cyril Bordage (cbordage@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41747
Subject: LU-14041 lnet: display CPT timing from lnetctl
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ae796c75bd118c893cd920cbc211df2240c8612b

Generated at Sat Feb 10 03:06:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.