[LU-2799] ldlm_cbd: This service may have more threads (192) than the given soft limit (128) Created: 12/Feb/13  Updated: 06/May/13  Resolved: 06/May/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Prakash Surya (Inactive) Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: sequoia, shh

Severity: 3
Rank (Obsolete): 6776

 Description   

I see the following message on the console of one of our Sequoia login nodes:

2013-02-12 15:26:17 Lustre: ldlm_cbd: This service may have more threads (192) than the given soft limit (128)

Is this troubling? What's the net effect of this?



 Comments   
Comment by Peter Jones [ 13/Feb/13 ]

Nathaniel

Could you please look into this one?

Thanks

Peter

Comment by Nathaniel Clark [ 13/Feb/13 ]

The message is just informational. It will be printed if you specified a number of threads larger than 128 via ptlrpc's ldlm_num_threads module parameter, or if the number of cores is large. See lustre/include/lustre_net.h:250 "example 3" states:

On 64-core machine with 8 partitions we will need LDLM_NTHRS_BASE(24)
threads for each partition to keep service healthy, so total threads
number should be 24 * 8 = 192.

This is not harmful nor should it be worrisome.

Comment by Prakash Surya (Inactive) [ 13/Feb/13 ]

Thanks. It should really be removed in that case. If an administrator can safely ignore the message, it should not make it to the console.

Comment by Prakash Surya (Inactive) [ 13/Feb/13 ]

Unless somebody has a convincing reason why the message should stay, I'm reopening this ticket in the hopes that it is removed.

Comment by Andreas Dilger [ 13/Feb/13 ]

Liang, any reason this message should be kept? Would it be better to limit the number of threads?

Prakash, how many sockets/cores are on this login node?

Comment by Prakash Surya (Inactive) [ 13/Feb/13 ]

48 on this node:

$ cat /proc/cpuinfo | grep proc | wc -l
48
Comment by Nathaniel Clark [ 13/Feb/13 ]

Andreas, Liang,
Should the message be changed to a CWARN instead. If there's an upper limit (even if it's soft) being passed, I would think, it should be logged.

Prakash, Sorry about closing the bug prematurely.

Comment by Andreas Dilger [ 13/Feb/13 ]

I'm more inclined to just quiet the message entirely, i.e. CDEBUG(), since there isn't anything the sysadmin can or should do about it.

I guess at some point we need to look at whether there should be one set of threads running on each of the cores, or if one set of threads per socket is enough?

Comment by Nathaniel Clark [ 16/Feb/13 ]

Patch to make the message a CDEBUG:
http://review.whamcloud.com/5447

Comment by Nathaniel Clark [ 24/Apr/13 ]

Patch has landed

Comment by Prakash Surya (Inactive) [ 24/Apr/13 ]

Thanks!

Generated at Sat Feb 10 01:28:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.