[LU-196] Set debug_mb size for every node Created: 07/Apr/11  Updated: 14/Jul/11  Resolved: 14/Jul/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0, Lustre 1.8.6
Fix Version/s: Lustre 2.1.0, Lustre 1.8.6

Type: Bug Priority: Minor
Reporter: Yang Sheng Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 19,944
Rank (Obsolete): 4962

 Description   

We set debug_mb size base on num_cpus of node. This may have different cpus between the nodes of cluster (clients or servers). So need set debug_mb size for every node.



 Comments   
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el5-x86_64 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el6-x86_64 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el5-i686 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,ubuntu-x86_64 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el6-i686 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el6-i686 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el5-x86_64 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el5-i686 #105
LU-196 Set debug_mb size for every node.

yangsheng : 28e45d7d730cabe0cf23706c5ad0c55d03adabd2
Files :

  • lustre/tests/test-framework.sh
Comment by Jian Yu [ 21/Apr/11 ]

Hi Yang Sheng,
I found b1_8 also had the issue described in the comment #2 for patch set 3 in http://review.whamcloud.com/407. Could you please also fix it in b1_8?

Comment by Chris Gearing (Inactive) [ 27/Apr/11 ]

Can you provide more information about how debug_mb is used and also PTLDEBUG, SUBSYSTEM which are set in similar places in the code. How do people use them, and why do the want to change them.

Then we can workout a requirement and solution.

Some of this will be my lack of understanding, but the discussion might clarify what is required.

Comment by Oleg Drokin [ 11/May/11 ]

debug_mb is used to limit amount of memory for debug log purposes.
Oftentimes the initial default is too small to actually hold sizeable logs for diagnosis of problems. To avert this some test scripts try to put some value for debug_mb setting to ensure there is a certain amount of data to be retained that proved to be needed in the past.

Now for the problems with that approach: the logs are actually held as per-cpu list of pages and the limit is at least 1M per CPU I think (might be even more), when you try to ask for less, its rejected. (in fact there are 3 queues per cpu too for different circumstances, with a factor of 10 difference between each).
Now for a system with 4 CPUs asking for debug_mb of 20 is reasonable.
But for an quad-cpu quad core system with HT the kernel sees 32 "cpus" and 20M for logs would be not enough and the value would be rejected.
Since the default value is bigger than the one we are trying to set anyway we probably could ignore the failure, on the other hand there is a question if a per-cpu log queues are going to be exhausted ahead of time and we still won't get useful debug data should anything happen, if this is the case, we might want to scale the value with number of cpus too.

As far as the PTLDEBUG and SUBSYSTEM go, every debug output code (CDEBUG(level, "....", ....") is one of the primitives) is only output if the "level" bitmask contains at least one bit from PTLDEBUG (actually being output to /proc/sys/lnet/debug) and the subsystem (set in each file at the beginning) matches the SUBSYSTEM mask.
This allows us to fine-tune what debug messages do we want from a system at run-time instead of recompiling all the time.

Comment by Yang Sheng [ 20/May/11 ]

Hi, Yujian, Do you have time to landed this patch to master? I think you know well than me for test system. So feel free to take this issue.

Comment by Jian Yu [ 23/May/11 ]

Hi, Yujian, Do you have time to landed this patch to master? I think you know well than me for test system. So feel free to take this issue.

OK, I will work on this.

Comment by Jian Yu [ 24/May/11 ]

Patch for b1_8 branch: http://review.whamcloud.com/593.
Patch for master branch: http://review.whamcloud.com/407.

Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 24/May/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #61
LU-196 set $PTLDEBUG, $SUBSYSTEM and $DEBUG_SIZE values on every node

Johann Lombardi : 8fb2481344e5e59b264eaee720c96844a9a7c9fe
Files :

  • lustre/tests/test-framework.sh
Comment by Jian Yu [ 08/Jun/11 ]

Patch for master branch: http://review.whamcloud.com/407 is still under review.

Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » i686,server,el5,ofa #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » i686,client,el5,ofa #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Build Master (Inactive) [ 14/Jul/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #206
LU-196 set debug_mb size for every node

Oleg Drokin : d5fe82b3000578cff0af3d293a7ffda8d6de9b46
Files :

  • lustre/tests/test-framework.sh
Comment by Jian Yu [ 14/Jul/11 ]

Patches have been pushed to the b1_8 and master branches in fs/lustre-release git repository. The issue was fixed.

Generated at Sat Feb 10 01:04:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.