[LU-7780] MDS crashed with oom-killer Created: 16/Feb/16 Updated: 06/Jun/16 Resolved: 16/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Frank Heckes (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled. Please note that build 20150215 is a vanilla build of the master brunch.
Attached files messages, console file of lola-11. Sorted slab usage as oom-killer was started. |
| Comments |
| Comment by Frank Heckes (Inactive) [ 22/Feb/16 ] |
|
Three (different) MDSes crashed with oom-killer started over the weekend (2016-02-20 - 2016-02-21). |
| Comment by Oleg Drokin [ 22/Feb/16 ] |
|
Can you please do the tip of b2_8 testign to see if it's also affected? Also on master - if you can enable memory allocation tracing, reproduce, and then extract the debug log out of a crashdump so that we can see where are all of those allocations come from. |
| Comment by Frank Heckes (Inactive) [ 07/Mar/16 ] |
|
currently '+malloc' has been added to thed debug on server nodes |
| Comment by Andreas Dilger [ 14/Mar/16 ] |
|
Frank or Cliff, Di, the memory is all used in ptlrpc_cache and large (1MB and 512KB) slab allocations. My first guess is that these large allocations relate to striped directory RPCs (i.e. OUT) and recovery, and are probably not being freed except at shutdown. There are 30k 1MB allocations consuming a whopping 30GB of memory, 400k ptlrpc_cache entries consuming 300MB of memory, and 230k 1KB allocations consuming 235MB. All of those slabs are growing continuously from startup, yet there aren't any objects (ldlm_lock, ldlm_resource, ldiskfs_inode, buffer_head, dentry, etc) that might normally grow as the node is caching a lot of data. It might be that we are just keeping too many requests in ptlrpc_cache? In that case, we might need to add a slab shrinker for ptlrpc_cache to keep the size of this slab in check, but I'm not sure this is the root problem because it seems all of the ptlrpc_cache items are in use, so it may be that there is a stray reference on the request that isn't being put somewhere? It is also a bit sad that the cache name is ptlrpc_cache but the internal usage is request_cache. The slab cache name should be changed to ptlrpc_request_cache to match the code so that grep can find all of the relevant code at once. Is it possible that ptlrpc_history is enabled on these nodes? That would also keep a lot of requests pinned in memory in ptlrpc_server_drop_request(). Normally this wouldn't be a problem, but if there are so many large buffers attached to the RPCs this could be causing the OOM, when smaller (e.g. 32KB) buffers wouldn't be a problem. It also isn't clear that there is a benefit to keeping the request buffers in the request history, since they are never used, so it might also be possible to release the buffers before inserting the request into the history, but I'm not sure if that is required here or not? |
| Comment by Frank Heckes (Inactive) [ 15/Mar/16 ] |
|
Debug filter has been changed to : debug_mb=128 debug=+malloc +trace |
| Comment by Peter Jones [ 16/Mar/16 ] |
|
Moving to 2.9 because it seems that this issue only occurs with multiple MDTs per MDS and does not happen with the more common configuration of a single MDT per MDS. Is this a duplicate of |
| Comment by Di Wang [ 16/Mar/16 ] |
|
Yes, this is duplicate with lu-7836 |
| Comment by Peter Jones [ 16/Mar/16 ] |
|
ok thanks Di |