[LU-9072] Upstream lnet-selftest causing node to crash on load Created: 01/Feb/17 Updated: 13/Apr/17 Resolved: 13/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream |
| Fix Version/s: | Upstream |
| Type: | Bug | Priority: | Critical |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Doug Oucharek (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The upstream version of lnet-selftest is triggering a node crash when it is loaded. We know that a kernel developer changed the definition of kiov in both LNet and lnet-selftest. The crash may be related. In one run, I saw this log before crashing: LNet: 16216:0:(framework.c:1712:sfw_startup()) Failed to reserve enough buffers: service debug, 256 needed: -30720 This may be a "hint" we are running out of memory and thereby causing instability in the kernel. It may be the new kiov system is causing memory exhaustion especially when allocating per-CPT. |
| Comments |
| Comment by Doug Oucharek (Inactive) [ 23/Feb/17 ] |
|
Oleg figured this out and submitted a patch upstream. I cannot find the "send-email" email right now, but will post it here when I do. Once it is landed, this ticket will be marked as resolved. The problem did not affect master so no patch is needed there. |