[LU-2708] MDS thrashing in ptlrpc_alloc_rqbd Created: 29/Jan/13 Updated: 06/Mar/13 Resolved: 06/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Christopher Morrone | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, sequoia, topsequoia | ||
| Environment: |
Lustre 2.3.58-6chaos (github.com/chaos/lustre) on MDS. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 6314 | ||||||||
| Description |
|
We have had some problems in recent weeks with the MDS on grove (sequoia's filesystem cluster) thrashing for anywhere from minutes to many hours while under load. While it does so, it is not appear to be handling traffic very quickly, and the node load is so high that login is nearly impossible. I caught it doing that for a while today during testing and dumped some SysRq info to the console. It looks to me like the active tasks may be spending too much time under ptlrpc_alloc_rqbd() doing vmallocs. Prakash had a patch to move those allocations to a slab. But it became time consuming to keep moving forward. We may need to look at reviving that. See attached file "console.grove-mds1.txt.bz2". |
| Comments |
| Comment by Christopher Morrone [ 29/Jan/13 ] |
|
Related to |
| Comment by Liang Zhen (Inactive) [ 29/Jan/13 ] |
|
I have posted another patch : http://review.whamcloud.com/#change,4940 |
| Comment by Liang Zhen (Inactive) [ 06/Mar/13 ] |
|
we have landed two patches for this: |