Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
In looking in to performance problems, it's very important to be able to trace the I/O patterns from userspace in to Lustre, and also understand the key basics of how Lustre handles that I/O (readahead, RPC generation).
Tracing is extremely difficult to do with any particular userspace tool - strace misses page faults entirely, and the perf tracing options vary from kernel to kernel. And of course userspace tools are entirely incapable of telling you what Lustre did internally.
The right place for this information is the Lustre debug logs. Unfortunately, the needed information is spread across a variety of debug flags, and is sometimes not logged at all. The result is the only way to get this information is a very heavyweight mask of debug flags, and some things must be inferred from other messages or lack of messages.
Ideally, we would have a debug flag which hit a small set of messages which gave basic I/O tracing information, which can be used to understand the I/O pattern from userspace and show the basics of how Lustre handles it. This needs to be a dedicated flag both for simplicity and to keep log lengths manageable. Tracing I/O should not require wading through a huge mass of unrelated messages, and should not require huge log files. A dedicated debug flag will accomplish this.