Details

    • 9223372036854775807

    Description

      The lvbo methods have to reallocate lu_env every time, which can be quite expensive in terms of CPU cycles at scale.
      The layers above can pass lu_env to reuse existing one.

      Attachments

        Issue Links

          Activity

            [LU-11164] lvbo_*() methods to reuse env
            pjones Peter Jones added a comment -

            Thanks Andrew!

            pjones Peter Jones added a comment - Thanks Andrew!
            panda Andrew Perepechko added a comment - bzzz , done: https://jira.whamcloud.com/browse/LU-12034

            panda would you please create a new ticket like Peter suggested?

            bzzz Alex Zhuravlev added a comment - panda would you please create a new ticket like Peter suggested?
            pjones Peter Jones added a comment -

            Given that LU-11164 landed in 2.12.0. which is now released it would be very helpful to move this discussion to a new Jira ticket linked to this one. That will make it easier to track getting the fix landed and included in a 2.12.x maintenance release - thanks!

            pjones Peter Jones added a comment - Given that LU-11164 landed in 2.12.0. which is now released it would be very helpful to move this discussion to a new Jira ticket linked to this one. That will make it easier to track getting the fix landed and included in a 2.12.x maintenance release - thanks!
            bzzz Alex Zhuravlev added a comment - - edited panda , I'm testing https://review.whamcloud.com/#/c/34333/

            Alex, Shuichi, what do you think about removing this allocation? Is it crucial for this patch?

            panda Andrew Perepechko added a comment - Alex, Shuichi, what do you think about removing this allocation? Is it crucial for this patch?
            panda Andrew Perepechko added a comment - - edited

            The following change restores createmany -o performance to that of pre LU-11164 for me:

            diff --git a/lustre/ptlrpc/client.c b/lustre/ptlrpc/client.c
            index 1ea3b9fdc1..f2a994c707 100644
            --- a/lustre/ptlrpc/client.c
            +++ b/lustre/ptlrpc/client.c
            @@ -2327,7 +2327,9 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
                    struct list_head *tmp;
                    struct ptlrpc_request *req;
                    struct l_wait_info lwi;
            +#if 0
                    struct lu_env _env;
            +#endif
                    time64_t timeout;
                    int rc;
                    ENTRY;
            @@ -2345,6 +2347,7 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
                    if (list_empty(&set->set_requests))
                             RETURN(0);
             
            +#if 0
                    /* ideally we want env provide by the caller all the time,
                     * but at the moment that would mean a massive change in
                     * LDLM while benefits would be close to zero, so just
            @@ -2356,6 +2359,7 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
                                    RETURN(rc);
                            env = &_env;
                    }
            +#endif
             
                     do {
                             timeout = ptlrpc_set_next_timeout(set);
            @@ -2436,8 +2440,10 @@ int ptlrpc_set_wait(const struct lu_env *env, struct ptlrpc_request_set *set)
                                     rc = req->rq_status;
                     }
             
            +#if 0
                    if (env && env == &_env)
                            lu_env_fini(&_env);
            +#endif
             
                    RETURN(rc);
             }
            
            panda Andrew Perepechko added a comment - - edited The following change restores createmany -o performance to that of pre LU-11164 for me: diff --git a/lustre/ptlrpc/client.c b/lustre/ptlrpc/client.c index 1ea3b9fdc1..f2a994c707 100644 --- a/lustre/ptlrpc/client.c +++ b/lustre/ptlrpc/client.c @@ -2327,7 +2327,9 @@ int ptlrpc_set_wait( const struct lu_env *env, struct ptlrpc_request_set *set) struct list_head *tmp; struct ptlrpc_request *req; struct l_wait_info lwi; +# if 0 struct lu_env _env; +#endif time64_t timeout; int rc; ENTRY; @@ -2345,6 +2347,7 @@ int ptlrpc_set_wait( const struct lu_env *env, struct ptlrpc_request_set *set) if (list_empty(&set->set_requests)) RETURN(0); +# if 0 /* ideally we want env provide by the caller all the time, * but at the moment that would mean a massive change in * LDLM while benefits would be close to zero, so just @@ -2356,6 +2359,7 @@ int ptlrpc_set_wait( const struct lu_env *env, struct ptlrpc_request_set *set) RETURN(rc); env = &_env; } +#endif do { timeout = ptlrpc_set_next_timeout(set); @@ -2436,8 +2440,10 @@ int ptlrpc_set_wait( const struct lu_env *env, struct ptlrpc_request_set *set) rc = req->rq_status; } +# if 0 if (env && env == &_env) lu_env_fini(&_env); +#endif RETURN(rc); }
            panda Andrew Perepechko added a comment - - edited

            I'm not an expert in this patch, but isn't it performing lu_env_init()/lu_env_fini() in ptlrpc_set_wait() when this sequence isn't really needed?

            This observation seems to be backed by the posted profiling data:

            mdt_close() execution times are equal with and without the patch: 146/148 us, the differences between ll_file_release() and mdc_close() execution times are equal too: 463-442=21=428-407.

            Apparently, the observed performance drop happens between mdc_close() and mdt_close() which smoothly correlates with the possible lu_env_init()/lu_env_fini() overhead in ptlrpc_set_wait().

            panda Andrew Perepechko added a comment - - edited I'm not an expert in this patch, but isn't it performing lu_env_init()/lu_env_fini() in ptlrpc_set_wait() when this sequence isn't really needed? This observation seems to be backed by the posted profiling data: mdt_close() execution times are equal with and without the patch: 146/148 us, the differences between ll_file_release() and mdc_close() execution times are equal too: 463-442=21=428-407. Apparently, the observed performance drop happens between mdc_close() and mdt_close() which smoothly correlates with the possible lu_env_init()/lu_env_fini() overhead in ptlrpc_set_wait().
            panda Andrew Perepechko added a comment - - edited

            Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch.

            The directory in the test is configured for the DoM layout:

            # lfs setstripe -E 1048576 -L mdt .
            # createmany -o file 1 20000
            
            panda Andrew Perepechko added a comment - - edited Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch. The directory in the test is configured for the DoM layout: # lfs setstripe -E 1048576 -L mdt . # createmany -o file 1 20000

            Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch.

            sihara Shuichi Ihara added a comment - Still create ost objects unless there is -m (mknod) option? let me confirm with/without patch.

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: