[LU-9142] (lib-lnet.h:293:lnet_msg_alloc()) LBUG Created: 22/Feb/17 Updated: 03/Mar/17 Resolved: 28/Feb/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Ruth Klundt (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
servers and clients: |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
sleeping in invalid context at ldiskfs/ext4_jbd2.c:259 followed by LBUG: 2017-02-21 17:51:20 [25522.565474] BUG: sleeping function called from invalid context at /builddir/build/BUILD/lustre-2.8.0_8.chaos/ldiskfs/ext4_jbd2.c:259 |
| Comments |
| Comment by Ruth Klundt (Inactive) [ 22/Feb/17 ] |
|
The I/O load was IOR, on a smallish test cluster ~200 nodes total. |
| Comment by Peter Jones [ 22/Feb/17 ] |
|
Amir Could you please advise on this one? Thanks Peter |
| Comment by Amir Shehata (Inactive) [ 27/Feb/17 ] |
|
Is this crash on the server or the client? If the server, has the kernel been patched? |
| Comment by Ruth Klundt (Inactive) [ 27/Feb/17 ] |
|
it's the MDS, and I'll check on the kernel. It is also from LLNL. |
| Comment by Ruth Klundt (Inactive) [ 27/Feb/17 ] |
|
The kernel has the following patches in the src rpm: ACPI-APEI-GHES-Further-instrumentation-refinements.patch |
| Comment by Amir Shehata (Inactive) [ 27/Feb/17 ] |
|
This doesn't look like a lustre patched kernel. I would expect the following patches to be applied: According to Andreas LLNL doesn't patch the kernel for lustre anymore because they use ZFS. Out of those patches jdb2-fix is the real bug fixer. And according to the crash log you have listed in the ticket this could be your problem. Please make sure to patch the kernel properly for the server nodes. |
| Comment by Ruth Klundt (Inactive) [ 28/Feb/17 ] |
|
Thanks for the diagnosis, you can close this bug. |
| Comment by Amir Shehata (Inactive) [ 28/Feb/17 ] |
|
Issue due to a non lustre patched kernel. |
| Comment by Ruth Klundt (Inactive) [ 01/Mar/17 ] |
|
FYI, the jbd2 patch would appear to be obsolete for rhel7.3, it was generated against 3.10.0-327.36.1. The kernel I'm currently looking at is 3.10.0-514, and there is only one spin_lock before the if-else block, and one unlock after. The patch shows one spin_lock within each of the if-else branches. I'm surprised if that patch is not failing, even against rhel stock kernels. |
| Comment by Ruth Klundt (Inactive) [ 01/Mar/17 ] |
|
sorry ignore the last comment, I was not looking in a redhat kernel. |
| Comment by Amir Shehata (Inactive) [ 03/Mar/17 ] |
|
Also note that as far as I can tell 2.8 only supports the following RHEL 7.x: 3.10-rhel7.series 3.10.0-327.3.1.el7 (RHEL 7.2) full list of kernel version support for 2.8.0 2.6-rhel6.series 2.6.32-431.29.2.el6 (RHEL 6.5) 2.6-rhel6.series 2.6.32-504.30.3.el6 (RHEL 6.6) 2.6-rhel6.series 2.6.32-573.12.1.el6 (RHEL 6.7) 3.10-rhel7.series 3.10.0-327.3.1.el7 (RHEL 7.2) 3.0-sles11sp3.series 3.0.101-0.47.71 (SLES11 SP3) 3.0-sles11sp3.series 3.0.101-68 (SLES11 SP4) |