Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6587

refactor OBD_ALLOC_LARGE to always do kmalloc first

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Since vmalloc allocations are notoriously unscalable and slow, we should do what other filesystems are doing (and what we converted upstream kernel client to do) and for large allocations we should always try kmalloc first (with NOWARN flag) and then if that fails, retry with vmalloc.

      Attachments

        Issue Links

          Activity

            [LU-6587] refactor OBD_ALLOC_LARGE to always do kmalloc first

            Patch has landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Patch has landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18034/
            Subject: LU-6587 obdclass: use OBD_FREE_LARGE with OBD_ALLOC_LARGE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0b1ad400c8f64575292a7ff54a8ce872a124b19e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18034/ Subject: LU-6587 obdclass: use OBD_FREE_LARGE with OBD_ALLOC_LARGE Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0b1ad400c8f64575292a7ff54a8ce872a124b19e

            Sorry I wasn't completely clear at what I was purposing. I was considering the idea of dropping vmalloc usage completely unless the allocation is more than 4MB. It was only those cases I would consider looking at vmalloc usage directly. Sorry for the confusion.

            As for memory accounting that is a separate topic for 2.9 but it is a good topic. We really should look at more standard ways to detect kernel leaks which are far more powerful than what lustre does today. I will open a separate ticket for that.

            simmonsja James A Simmons added a comment - Sorry I wasn't completely clear at what I was purposing. I was considering the idea of dropping vmalloc usage completely unless the allocation is more than 4MB. It was only those cases I would consider looking at vmalloc usage directly. Sorry for the confusion. As for memory accounting that is a separate topic for 2.9 but it is a good topic. We really should look at more standard ways to detect kernel leaks which are far more powerful than what lustre does today. I will open a separate ticket for that.

            James, I'd be happy to use the libcfs_kvzalloc() and libcfs_kbfree() in master as well. We definitely can't use vmalloc() for everything in Lustre, since the performance penalty would be very high as Oleg wrote. Those helpers should still be wrapped as OBD_ALLOC_LARGE() and OBD_FREE_LARGE() to account memory allocations so that we can find memory leaks in the code. In either case, we still need to know that "large" allocations need to be freed differently from normal ones.

            adilger Andreas Dilger added a comment - James, I'd be happy to use the libcfs_kvzalloc() and libcfs_kbfree() in master as well. We definitely can't use vmalloc() for everything in Lustre, since the performance penalty would be very high as Oleg wrote. Those helpers should still be wrapped as OBD_ALLOC_LARGE() and OBD_FREE_LARGE() to account memory allocations so that we can find memory leaks in the code. In either case, we still need to know that "large" allocations need to be freed differently from normal ones.
            green Oleg Drokin added a comment -

            James: The problem with vmalloc is that it is expensive, it's all done under a single lock for one.
            As such trying kmalloc first has a big advantage of speed should there be big enough chunk of memory really available.

            Andreas: You are right that the overhead is there, even though it is pretty much negligible and for as long as we retain OBD_ALLOC_LARGE/FREE it makes the most sense to have the check in there and not in the OBD_ALLOC/FREE

            green Oleg Drokin added a comment - James: The problem with vmalloc is that it is expensive, it's all done under a single lock for one. As such trying kmalloc first has a big advantage of speed should there be big enough chunk of memory really available. Andreas: You are right that the overhead is there, even though it is pretty much negligible and for as long as we retain OBD_ALLOC_LARGE/FREE it makes the most sense to have the check in there and not in the OBD_ALLOC/FREE

            Doesn't OBD_ALLOC_LARGE always try a kmalloc first then if it fails try a vmalloc? I don't think we can just drop the is_vmalloc_addr so easily. I agree with you Andreas that the vmalloc fallbacks are bad news but instead of trying to modify OBD_FREE_LARGE why don't we ripe off the band aid and start using vmalloc directly when it really needed. Note the upstream client has function wrapper libcfs_kvzalloc() and libcfs_kvzalloc_cpt() in libcfs to replace all those macros. Honestly I like to just use vmalloc directly.

            simmonsja James A Simmons added a comment - Doesn't OBD_ALLOC_LARGE always try a kmalloc first then if it fails try a vmalloc? I don't think we can just drop the is_vmalloc_addr so easily. I agree with you Andreas that the vmalloc fallbacks are bad news but instead of trying to modify OBD_FREE_LARGE why don't we ripe off the band aid and start using vmalloc directly when it really needed. Note the upstream client has function wrapper libcfs_kvzalloc() and libcfs_kvzalloc_cpt() in libcfs to replace all those macros. Honestly I like to just use vmalloc directly.

            Sure there is little overhead to check is_vmalloc_addr() for every single allocation, but not zero, so why would we use it everywhere when it is not needed. Also, in the upstream kernel these macros are gone and we don't want to have the vmalloc(( fallback and calls to is_vmalloc_addr() check everywhere in the code simply because we don't know anymore which allocations might be large and which ones are not. We'd be taking a huge step backwards in terms of code quality, and a patch like that would never be accepted upstream, so why would we do it in master just because it can hide inside a macro?

            Having huge allocations that require vmalloc() is something we should be very aware of and make a conscious decision to do, as otherwise we might be wasting a lot of memory accidentally.

            adilger Andreas Dilger added a comment - Sure there is little overhead to check is_vmalloc_addr() for every single allocation, but not zero, so why would we use it everywhere when it is not needed. Also, in the upstream kernel these macros are gone and we don't want to have the vmalloc(( fallback and calls to is_vmalloc_addr() check everywhere in the code simply because we don't know anymore which allocations might be large and which ones are not. We'd be taking a huge step backwards in terms of code quality, and a patch like that would never be accepted upstream, so why would we do it in master just because it can hide inside a macro? Having huge allocations that require vmalloc() is something we should be very aware of and make a conscious decision to do, as otherwise we might be wasting a lot of memory accidentally.
            green Oleg Drokin added a comment -
            static inline int is_vmalloc_addr(const void *x)
            {
            #ifdef CONFIG_MMU
                    unsigned long addr = (unsigned long)x;
            
                    return addr >= VMALLOC_START && addr < VMALLOC_END;
            #else
                    return 0;
            #endif
            }
            

            So I believe there's next to no overhead in is_vmalloc_addr esp. compared to other stuff we do in OBD_FREE()

            I'd be instead inclined to drop OBD_ALLOC_LARGE and just always do OBD_ALLOC that would fallbacl to vmalloc if regular alloc failed?

            green Oleg Drokin added a comment - static inline int is_vmalloc_addr( const void *x) { #ifdef CONFIG_MMU unsigned long addr = (unsigned long )x; return addr >= VMALLOC_START && addr < VMALLOC_END; # else return 0; #endif } So I believe there's next to no overhead in is_vmalloc_addr esp. compared to other stuff we do in OBD_FREE() I'd be instead inclined to drop OBD_ALLOC_LARGE and just always do OBD_ALLOC that would fallbacl to vmalloc if regular alloc failed?

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18034
            Subject: LU-6587 obdclass: use OBD_FREE_LARGE with OBD_ALLOC_LARGE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a7ac09582c72dad0be99db2072971dd34d6d6165

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18034 Subject: LU-6587 obdclass: use OBD_FREE_LARGE with OBD_ALLOC_LARGE Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a7ac09582c72dad0be99db2072971dd34d6d6165

            Since this is only a few lines of change, it would be good to get this fixed before 2.9 if possible.

            adilger Andreas Dilger added a comment - Since this is only a few lines of change, it would be good to get this fixed before 2.9 if possible.

            People

              ys Yang Sheng
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: