[LU-12216] lctl set_param debug_mb=xxx set debug buffer to the wrong size Created: 23/Apr/19 Updated: 23/Apr/19 Resolved: 23/Apr/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
lctl set_param debug_mb was always in MB. But in 2.10.x its wrong. lctl get_param debug_mb Results in log file that is 474MB
|
| Comments |
| Comment by Patrick Farrell (Inactive) [ 23/Apr/19 ] |
|
Mahmoud, There has never been a fixed upper limit on how large debug_mb can be set. The behavior you're seeing is intended. Additionally, the debug_mb value is the total buffer size across all CPUs ('CPU' here is defined as "thing a process can be scheduled on, visible in cat /proc/cpuinfo"), the actual buffer space is evenly divided among CPUs. So if you have a 4801 MiB debug buffer and 16 CPUs, each CPU has ~300 MiB of debug buffer. This means that if you have a single threaded workload (like a single process reading or writing a file), it will fill the 300 MiB and then begin to wrap around. Because Lustre has various network daemons, you will see some activity on other CPUs, so you will never get debug messages from just one CPU. So this means that you will never get [with these settings and 16 CPU cores] more than 300 MiB of log (at one time) from a given CPU. I am assuming when you say you have a 474 MiB debug log, you mean that's the largest log you get when recording a particular activity, and you expect more. (it is also possible that you are recording everything, if you have a time limited process). If you look at a Lustre debug log line: 00000100:00000001:3.0:1555092720.263653:0:6377:0:(client.c:701:ptlrpc_request_bufs_pack()) Process entered The third field (3.0 here) reflects the CPU in use. (3 is the CPU, 0 reflects whether or not you're in an interrupt context - it will mostly be 0). But, assuming you are running out of debug buffer: All of this is expected. If you need a larger per CPU buffer, the solution is simply to set an extremely large debug_mb. debug_mb is not allocated until it is used, and anything that was used is freed when you set it back to a lower value, so setting it to enormous values for a short time to capture a specific event is OK. (Do not leave it at those large values or you will be wasting memory. This is purely a temporary thing for gathering logs.) |
| Comment by Mahmoud Hanafi [ 23/Apr/19 ] |
|
In the past the upper limit was like 4GB. At least in <2.7.x |
| Comment by Patrick Farrell (Inactive) [ 23/Apr/19 ] |
|
Mahmoud, Perhaps you had nodes with a small memory footprint. The limit for debug_mb has been 80% of memory since some time before Lustre 2.0 was released. In any case, the behavior here is all as intended - Do you have any further questions? |
| Comment by Mahmoud Hanafi [ 23/Apr/19 ] |
|
no, thank you |
| Comment by Patrick Farrell (Inactive) [ 23/Apr/19 ] |
|
Thanks, Mahmoud! |