[LU-1325] loading large enough binary from lustre trigger OOM killer during page_fault while a large amount of memory is available Created: 16/Apr/12 Updated: 07/Jun/12 Resolved: 07/Jun/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexandre Louvet | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6414 |
| Description |
|
While loading a large enough binary, we hit OOM during page_fault while the system have still a lot of free memory available (in our case we still have 60 GB of free memory on a node with 64 GB installed). The problem doesn't popup is the binary is not big enough and if there isn't enough concurrency. A simple ls works, a small program too, but if the size increase to few MB with some DSO around and the binary is run with mpirun, the page_fault looks interrupted by a signal into cl_lock_state_wait then the error code return up to ll_fault0 where is it replaced by a VM_FAULT_ERROR which trigger the OOM. Here is the extract from the trace collected (and attached) : We are able to reproduce the problem at will, by scheduling through the batch scheduler a mpi job of 32 cores, 2 nodes (16 cores per nodes) on the customer system. I hasn't been able to reproduce it on an another system. I also tried to retrieve the culprit signal by setting panic_on_oom, but unfortunately it seems to have been cleared during the oom handling. Strac'ing is too complicated with the mpi layer. Alex. |
| Comments |
| Comment by Peter Jones [ 16/Apr/12 ] |
|
Jinshan will look into this one |
| Comment by Jinshan Xiong (Inactive) [ 18/Apr/12 ] |
|
Please try patch http://review.whamcloud.com/2574. |
| Comment by Bruno Faccini (Inactive) [ 15/May/12 ] |
|
A nasty side-effect/consequence of this problem is that it often (always ??) leaves processes stuck on at least one mm_struct->mmap_sem when the owner of the semaphore is impossible to find. This may come from a hole/bug in OOM algorithm allowing a process to either take the semaphore and leave or self-deadlock on it ... The bad thing is that finally an affected node has to be re-booted since commands like "ps/pidof/swapoff/..." also block for ever on these semaphores. |
| Comment by Jinshan Xiong (Inactive) [ 15/May/12 ] |
|
Hi Bruno, which version of patch are you running? I saw this problem in earlier versions but it should have been fixed in patch set 7. |
| Comment by Bruno Faccini (Inactive) [ 23/May/12 ] |
|
I will ask our/Bull integration team and let you know. |
| Comment by Peter Jones [ 04/Jun/12 ] |
|
Bruno Any answer on this yet? Can we mark this as a duplicate of Peter |
| Comment by Alexandre Louvet [ 07/Jun/12 ] |
|
To answer Jinshan question, we never got any patch from this Jira (nor Alex. |
| Comment by Peter Jones [ 07/Jun/12 ] |
|
ok thanks Alexandre. |