Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15546

Shared Directory File Creates regression seen in 2.15 when comparing to 2.12.6

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When testing mdtest 2.15 (2.14.57) and comparing to 2.12.6, I see a large 25% regression with Shared Directory File Creates. Perf traces (attached) show a lot of extra ldlm overhead.

      #!/bin/bash
      
      NODES=21
      PPN=16
      PROCS=$(( $NODES * $PPN ))
      MDT_COUNT=1
      PAUSED=120
      
      srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -E -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo2/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_0k_ost_shared.out
      
      srun -N $NODES --ntasks-per-node $PPN ~bloewe/benchmarks/ior-3.3.0-CentOS-8.2/install/bin/mdtest -v -i 5 -p $PAUSED -C -w 32768 -E -e 32768 -T -r -n $(( $MDT_COUNT * 1048576 / $PROCS )) -d /mnt/kjlmo2/pkoutoupis/mdt0/test.`date +"%Y%m%d.%H%M%S"` 2>&1 |& tee f_mdt0_32k_ost_shared.out
      

      Attachments

        Issue Links

          Activity

            [LU-15546] Shared Directory File Creates regression seen in 2.15 when comparing to 2.12.6
            pjones Peter Jones added a comment -

            As per the discussion on the LWG call, Etienne's patch (which just landed) is what we will address for 2.15. I suggest that the other patches in flight get moved to a new JIRA for possible inclusion in a future release

            pjones Peter Jones added a comment - As per the discussion on the LWG call, Etienne's patch (which just landed) is what we will address for 2.15. I suggest that the other patches in flight get moved to a new JIRA for possible inclusion in a future release

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46679/
            Subject: LU-15546 mdt: mdt_reint_open lookup before locking
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f14090e56c9d94e3cfaa6f13f357173d6d570547

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46679/ Subject: LU-15546 mdt: mdt_reint_open lookup before locking Project: fs/lustre-release Branch: master Current Patch Set: Commit: f14090e56c9d94e3cfaa6f13f357173d6d570547

            Andreas, I have not made the fixes. I see the bugs that you are referring to now.

            koutoupis Petros Koutoupis added a comment - Andreas, I have not made the fixes. I see the bugs that you are referring to now.

            Petros, did you make fixes to 46696 before testing it? There are a couple of bugs in the current version patch (though clearly described there), but they could be fixed relatively easily, but I have no fast system to benchmark it myself. If you have made fixes, please push a new version, otherwise I will take another crack at it.

            adilger Andreas Dilger added a comment - Petros, did you make fixes to 46696 before testing it? There are a couple of bugs in the current version patch (though clearly described there), but they could be fixed relatively easily, but I have no fast system to benchmark it myself. If you have made fixes, please push a new version, otherwise I will take another crack at it.
            koutoupis Petros Koutoupis added a comment - - edited

            I tested change 46696 and it performs a bit better than just 46679 (on shared directory file creates). It is within 5% of my original baseline. Actually, around 3.5-4% less which I consider great.

            koutoupis Petros Koutoupis added a comment - - edited I tested change 46696 and it performs a bit better than just 46679 (on shared directory file creates). It is within 5% of my original baseline. Actually, around 3.5-4% less which I consider great.

            "Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>" uploaded a new patch: https://review.whamcloud.com/46738
            Subject: LU-15546 mdt: optimistically trust non-locked lookup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 55162c4d0fc798b5070d08a670f1e36b575672c6

            gerrit Gerrit Updater added a comment - "Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>" uploaded a new patch: https://review.whamcloud.com/46738 Subject: LU-15546 mdt: optimistically trust non-locked lookup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 55162c4d0fc798b5070d08a670f1e36b575672c6

            koutoupis feel free to play with my patch https://review.whamcloud.com/46696 - it isn't quite ready, just something I whipped together and have not tested, but close to something that would work.

            adilger Andreas Dilger added a comment - koutoupis feel free to play with my patch https://review.whamcloud.com/46696 - it isn't quite ready, just something I whipped together and have not tested, but close to something that would work.

            I tested (mdtest shared directory file creates) change 46679 against my original 2.15 baseline and it is about a 20-22% improvement but still a 5% regression from a revert of LU-10262.

            koutoupis Petros Koutoupis added a comment - I tested (mdtest shared directory file creates) change 46679 against my original 2.15 baseline and it is about a 20-22% improvement but still a 5% regression from a revert of LU-10262 .

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46696
            Subject: LU-15546 mdt: keep history of mdt_reint_open() lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 47cf66750b44a34be3543e0f3e629cbf5103f8da

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46696 Subject: LU-15546 mdt: keep history of mdt_reint_open() lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 47cf66750b44a34be3543e0f3e629cbf5103f8da

            @Andreas Dilger. Of course. I can do so over the next few days.

            koutoupis Petros Koutoupis added a comment - @Andreas Dilger. Of course. I can do so over the next few days.
            adilger Andreas Dilger added a comment - - edited

            koutoupis, would you be able to test this patch in your testbed to see if it resolves the performance problem?

            eaujames, thanks for the patch. One further option that might avoid the extra lookup is to track a history of recent open modes (e.g. most recent 256 opens), and then use that to decide whether to do the pre-lookup or not.

            For workloads like mdtest that never try to recreate existing files the pre-lookup is purely overhead, so defaulting to PW after a number of such opens would avoid this. For many FORTRAN threads, repeatedly opening an existing file with O_CREAT this would default over time to PR. A mixed workload would always do the pre-lookup, which is still a win compared to getting the wrong DLM lock. In the rare case that the decision is wrong (e.g. workload change), it can retry the same as if the lookup was racy. Something like:

            struct mdt_device {
                     unsigned int mdt_open_lock_history;
            };
            #define MDT_OPEN_LOCK_COUNT 256
            /* bias toward LCK_PW since it does not need a full retry loop */
            #define MDT_OPEN_LOCK_THRESH_PW LCK_PW * (MDT_OPEN_LOCK_COUNT - 32)
            #define MDT_OPEN_LOCK_THRESH_PR LCK_PR * (MDT_OPEN_LOCK_COUNT + 8)
            
            static int mdt_init0(const struct lu_env *env, struct mdt_device *m,
                                 struct lu_device_type *ldt, struct lustre_cfg *cfg)
            {
                    mdt->mdt_open_history  = (LCK_PR + LCK_PW) * MDT_OPEN_LOCK_COUNT / 2;
            }
            
            void mdt_open_lock_history_add(struct mdt_device *mdt, enum ldlm_mode mode)
            {
                    mdt->mdt_open_history = (mdt->mdt_open_history * (MDT_OPEN_LOCK_COUNT - 1) + mode) /
                            MDT_OPEN_LOCK_COUNT;
            }
            
            static inline enum ldlm_mode mdt_open_lock_mode(struct mdt_thread_info *info,
            						struct mdt_object *p,
            						struct lu_name *name,
            						u64 open_flags)
            {
            	struct lu_fid fid;
            	enum ldlm_lock_mode mode;
            
            	/* We don't need to take the DLM lock for a volatile */
            	if (open_flags & MDS_OPEN_VOLATILE)
            		return LCK_NL;
            
                    /* if most recent opens used the same mode, assume next one will also */
                    if (info->mti_mdt->mdt_open_history > MDT_OPEN_LOCK_THRESH_PW)
                           return  LCK_PW;
                    if (info->mti_mdt->mdt_open_history > MDT_OPEN_LOCK_THRESH_PR)
                           return  LCK_PR;
            
            	/* If the file exists we only need a read lock on the parent */
            	mode = mdo_lookup(info->mti_env, mdt_object_child(p), name, &fid,
            			&info->mti_spec) == 0 ? LCK_PR : LCK_PW;
            
                    mdt_open_lock_history_add(info->mti_mdt, mode);
            
            	return mode;
            }
            
            int mdt_reint_open(struct mdt_thread_info *info, struct mdt_lock_handle *lhc)
            {
                    if (result == -ENOENT) {
                            if (lh != NULL && lock_mode == LCK_PR) {
                                    /* first pass: get write lock and restart */
                                    mdt_object_unlock(info, parent, lh, 1);
                                    mdt_clear_disposition(info, ldlm_rep, DISP_LOOKUP_NEG);
                                    mdt_lock_handle_init(lh);
                                    lock_mode = LCK_PW;
                                    mdt_open_lock_history_add(info->mti_mdt, lock_mode);
                                    goto again_pw;
                            }
                    }
                    :
                            if (created) {
                                    :
                            } else if (lock_mode == LCK_PW) {
                                    /* LCK_PW was not needed, child already exists */
                                    mdt_open_lock_history_add(info->mti_mdt, lock_mode);
                            }
            }
            
            adilger Andreas Dilger added a comment - - edited koutoupis , would you be able to test this patch in your testbed to see if it resolves the performance problem? eaujames , thanks for the patch. One further option that might avoid the extra lookup is to track a history of recent open modes (e.g. most recent 256 opens), and then use that to decide whether to do the pre-lookup or not. For workloads like mdtest that never try to recreate existing files the pre-lookup is purely overhead, so defaulting to PW after a number of such opens would avoid this. For many FORTRAN threads, repeatedly opening an existing file with O_CREAT this would default over time to PR. A mixed workload would always do the pre-lookup, which is still a win compared to getting the wrong DLM lock. In the rare case that the decision is wrong (e.g. workload change), it can retry the same as if the lookup was racy. Something like: struct mdt_device { unsigned int mdt_open_lock_history; }; #define MDT_OPEN_LOCK_COUNT 256 /* bias toward LCK_PW since it does not need a full retry loop */ #define MDT_OPEN_LOCK_THRESH_PW LCK_PW * (MDT_OPEN_LOCK_COUNT - 32) #define MDT_OPEN_LOCK_THRESH_PR LCK_PR * (MDT_OPEN_LOCK_COUNT + 8) static int mdt_init0( const struct lu_env *env, struct mdt_device *m, struct lu_device_type *ldt, struct lustre_cfg *cfg) { mdt->mdt_open_history = (LCK_PR + LCK_PW) * MDT_OPEN_LOCK_COUNT / 2; } void mdt_open_lock_history_add(struct mdt_device *mdt, enum ldlm_mode mode) { mdt->mdt_open_history = (mdt->mdt_open_history * (MDT_OPEN_LOCK_COUNT - 1) + mode) / MDT_OPEN_LOCK_COUNT; } static inline enum ldlm_mode mdt_open_lock_mode(struct mdt_thread_info *info, struct mdt_object *p, struct lu_name *name, u64 open_flags) { struct lu_fid fid; enum ldlm_lock_mode mode; /* We don't need to take the DLM lock for a volatile */ if (open_flags & MDS_OPEN_VOLATILE) return LCK_NL; /* if most recent opens used the same mode, assume next one will also */ if (info->mti_mdt->mdt_open_history > MDT_OPEN_LOCK_THRESH_PW) return LCK_PW; if (info->mti_mdt->mdt_open_history > MDT_OPEN_LOCK_THRESH_PR) return LCK_PR; /* If the file exists we only need a read lock on the parent */ mode = mdo_lookup(info->mti_env, mdt_object_child(p), name, &fid, &info->mti_spec) == 0 ? LCK_PR : LCK_PW; mdt_open_lock_history_add(info->mti_mdt, mode); return mode; } int mdt_reint_open(struct mdt_thread_info *info, struct mdt_lock_handle *lhc) { if (result == -ENOENT) { if (lh != NULL && lock_mode == LCK_PR) { /* first pass: get write lock and restart */ mdt_object_unlock(info, parent, lh, 1); mdt_clear_disposition(info, ldlm_rep, DISP_LOOKUP_NEG); mdt_lock_handle_init(lh); lock_mode = LCK_PW; mdt_open_lock_history_add(info->mti_mdt, lock_mode); goto again_pw; } } : if (created) { : } else if (lock_mode == LCK_PW) { /* LCK_PW was not needed, child already exists */ mdt_open_lock_history_add(info->mti_mdt, lock_mode); } }

            People

              eaujames Etienne Aujames
              koutoupis Petros Koutoupis
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: