[LU-6587] refactor OBD_ALLOC_LARGE to always do kmalloc first Created: 11/May/15 Updated: 05/Feb/16 Resolved: 05/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Since vmalloc allocations are notoriously unscalable and slow, we should do what other filesystems are doing (and what we converted upstream kernel client to do) and for large allocations we should always try kmalloc first (with NOWARN flag) and then if that fails, retry with vmalloc. |
| Comments |
| Comment by Jian Yu [ 28/May/15 ] |
|
Yang Sheng is creating the patch and will push it to Gerrit. |
| Comment by Gerrit Updater [ 28/May/15 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/14994 |
| Comment by Gerrit Updater [ 24/Jun/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14994/ |
| Comment by Yang Sheng [ 24/Jun/15 ] |
|
The next step of this ticket is replacing OBD_FREE_LARGE to OBD_FREE. Either all in one or one by one as subsystem. But i think we need a grace period for every one know that. So could we close this ticket first? |
| Comment by James A Simmons [ 24/Jun/15 ] |
|
Looking at patches by their sublayer the areas under going the most changes are llite, osp, and target. I would recommend doing a patch set with that combination that could perhaps be landed after feature freeze. The rest of the OBD_FREE_LARGE users appear to be safe to update. I noticed ptlrpc and obdclass are large users of OBD_FREE_LARGE so if you might wanted to do a patch set covering those regions. |
| Comment by Yang Sheng [ 25/Jun/15 ] |
|
Thanks for your comment, James. I think it is good idea to make a patch against stable code first. |
| Comment by Peter Jones [ 23/Nov/15 ] |
|
It looks like this is landed for 2.8 |
| Comment by Andreas Dilger [ 15/Jan/16 ] |
|
I noticed that there was something landed in the http://review.whamcloud.com/14994 patch that I don't really like. Moving is_vmalloc_addr() into OBD_FREE() means that there is the overhead of this check for every call to OBD_FREE(), even though this is not needed for the large majority of allocations. It would be better to have the is_vmalloc_addr() check only in OBD_FREE_LARGE() and remove it from OBD_FREE(). The deprecation of OBD_FREE_LARGE() should also be removed from checkpatch.pl. There are a couple of patches that have landed since then that need to be converted back to using OBD_FREE_LARGE(), at least: af13cfff297d4882de35fb7c11bf5261293b8287 09141c0796802e7a3471c084ea5928674b3a1862 |
| Comment by Andreas Dilger [ 15/Jan/16 ] |
|
Since this is only a few lines of change, it would be good to get this fixed before 2.9 if possible. |
| Comment by Gerrit Updater [ 18/Jan/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18034 |
| Comment by Oleg Drokin [ 20/Jan/16 ] |
static inline int is_vmalloc_addr(const void *x) { #ifdef CONFIG_MMU unsigned long addr = (unsigned long)x; return addr >= VMALLOC_START && addr < VMALLOC_END; #else return 0; #endif } So I believe there's next to no overhead in is_vmalloc_addr esp. compared to other stuff we do in OBD_FREE() I'd be instead inclined to drop OBD_ALLOC_LARGE and just always do OBD_ALLOC that would fallbacl to vmalloc if regular alloc failed? |
| Comment by Andreas Dilger [ 20/Jan/16 ] |
|
Sure there is little overhead to check is_vmalloc_addr() for every single allocation, but not zero, so why would we use it everywhere when it is not needed. Also, in the upstream kernel these macros are gone and we don't want to have the vmalloc(( fallback and calls to is_vmalloc_addr() check everywhere in the code simply because we don't know anymore which allocations might be large and which ones are not. We'd be taking a huge step backwards in terms of code quality, and a patch like that would never be accepted upstream, so why would we do it in master just because it can hide inside a macro? Having huge allocations that require vmalloc() is something we should be very aware of and make a conscious decision to do, as otherwise we might be wasting a lot of memory accidentally. |
| Comment by James A Simmons [ 21/Jan/16 ] |
|
Doesn't OBD_ALLOC_LARGE always try a kmalloc first then if it fails try a vmalloc? I don't think we can just drop the is_vmalloc_addr so easily. I agree with you Andreas that the vmalloc fallbacks are bad news but instead of trying to modify OBD_FREE_LARGE why don't we ripe off the band aid and start using vmalloc directly when it really needed. Note the upstream client has function wrapper libcfs_kvzalloc() and libcfs_kvzalloc_cpt() in libcfs to replace all those macros. Honestly I like to just use vmalloc directly. |
| Comment by Oleg Drokin [ 21/Jan/16 ] |
|
James: The problem with vmalloc is that it is expensive, it's all done under a single lock for one. Andreas: You are right that the overhead is there, even though it is pretty much negligible and for as long as we retain OBD_ALLOC_LARGE/FREE it makes the most sense to have the check in there and not in the OBD_ALLOC/FREE |
| Comment by Andreas Dilger [ 21/Jan/16 ] |
|
James, I'd be happy to use the libcfs_kvzalloc() and libcfs_kbfree() in master as well. We definitely can't use vmalloc() for everything in Lustre, since the performance penalty would be very high as Oleg wrote. Those helpers should still be wrapped as OBD_ALLOC_LARGE() and OBD_FREE_LARGE() to account memory allocations so that we can find memory leaks in the code. In either case, we still need to know that "large" allocations need to be freed differently from normal ones. |
| Comment by James A Simmons [ 21/Jan/16 ] |
|
Sorry I wasn't completely clear at what I was purposing. I was considering the idea of dropping vmalloc usage completely unless the allocation is more than 4MB. It was only those cases I would consider looking at vmalloc usage directly. Sorry for the confusion. As for memory accounting that is a separate topic for 2.9 but it is a good topic. We really should look at more standard ways to detect kernel leaks which are far more powerful than what lustre does today. I will open a separate ticket for that. |
| Comment by Gerrit Updater [ 05/Feb/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18034/ |
| Comment by Joseph Gmitter (Inactive) [ 05/Feb/16 ] |
|
Patch has landed for 2.8 |