[LU-4091] Kernel Panic MDS jbd2_journal_start+0x4f/0x110 [jbd2] Created: 11/Oct/13 Updated: 20/May/14 Resolved: 20/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Hussein N. Harake (Inactive) | Assignee: | Peter Jones |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2 sockets, Intel CPU E5-2670 2.60GHz 32GB DDR3 memory1600 MHz |
||
| Severity: | 4 |
| Rank (Obsolete): | 10997 |
| Description |
|
At certain load of benchmarking the metadata performance, the MDS crashes kernel panic, the job is running on 16 MDTs and served by one MDS. Oct 11 11:44:49 greina15 kernel: BUG: unable to handle kernel NULL pointer dereference at (null) regards |
| Comments |
| Comment by Gabriele Paciucci (Inactive) [ 11/Oct/13 ] |
|
Do you any idea of the load average of the system during your experiments? |
| Comment by Hussein N. Harake (Inactive) [ 11/Oct/13 ] |
|
During the load, an estimation of 60 to 85% of CPU load and 80% of memory. |
| Comment by Hussein N. Harake (Inactive) [ 15/Oct/13 ] |
|
I changed the MDS server with 64GB of memory instead of 32GB, the test passed for the second time without any crash. |
| Comment by Gabriele Paciucci (Inactive) [ 15/Oct/13 ] |
|
Hi Hussein, |
| Comment by Gabriele Paciucci (Inactive) [ 15/Oct/13 ] |
|
The memory needed for the MDS depends on the number of clients, the size of the filesystem journal, the number of locks. If we assume that you have 16 clients locking 100k files each, 16 MDT with a 400MB journal each (default) Operating system overhead = 2 GB Additional RAM is used for caching file data for the working set, which is not actively in use by clients but should be kept "hot" for improved access times. Approximately 1.5 KB per file is needed to keep a file in cache without a lock. |
| Comment by Gabriele Paciucci (Inactive) [ 15/Oct/13 ] |
|
In our design best practice we suggest for Sandy/Ivy Bridge processor to complete all available memory slots (128GB). We also suggest in modern/high speed/low latency MDT (SSD based) to increase the journal to 4GB. |
| Comment by Hussein N. Harake (Inactive) [ 15/Oct/13 ] |
|
A snapshot of the /var/log/messages is already in the description, I don't have a crash dump. |
| Comment by Peter Jones [ 20/May/14 ] |
|
As per CSCS this ticket is ok to close because it no longer occurs since extra memory was added to the affected system |