[LU-12686] ln_mt_waitq use by lnet monitor thread is completely ineffective Created: 23/Aug/19  Updated: 16/Sep/19  Resolved: 16/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Neil Brown Assignee: Neil Brown
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The ln_mt_waitq is only ever waited on by a

  wait_event_interruptble_timeout()

call which waits for 'false' to be 'true'.  As false is never true, this always waits the full timeout, which makes the waitq pointless, and explains the effort that has gone in to choosing a good timeout value.

Place that wakeup this waitqueue should set a flag so that lnet_monitor thread knows something has happened and that it should stop waiting.

This is most easily done by changing the waitqueue to a completion, as a compltion is a waitqueue combined with a flag (actually a counter).

 

The timeout should be changed too - probably it can then be must larger.

 



 Comments   
Comment by Gerrit Updater [ 23/Aug/19 ]

Neil Brown (neilb@suse.com) uploaded a new patch: https://review.whamcloud.com/35874
Subject: LU-12686 lnet: change ln_mt_waitq to a completion.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b4b792d5e7c0c2636cbcbc008f8149bbd9c8cc67

Comment by Gerrit Updater [ 16/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35874/
Subject: LU-12686 lnet: change ln_mt_waitq to a completion.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b81bcc6c6f0c54c48e908eccb13adc620582881e

Comment by Peter Jones [ 16/Sep/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:54:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.