Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11115

OST selection algorithm broken with max_create_count=0 or empty OSTs

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0, Lustre 2.10.5
    • Lustre 2.7.0, Lustre 2.10.0
    • None
    • Server running:
      CentOS-6 lcentos-release-6-9.el6.12.3.x86_64
      lustre-2.7.3-1nasS_mofed33v3g_2.6.32_642.15.1.el6.20170609.x86_64.lustre273.x86_64
    • 3
    • 9223372036854775807

    Description

      We have blocked new object creation to some of our OSTs with commands like:

      lctl set_param osp.$OSTNAME.max_create_count=0

      This is to drain data off of storage to be repurposed as spares. Three targets are already at 0%, and confirmed to have no remaining objects with e2scan and lester. 11 other targets are blocked and data is being migrated off.

      Noticed that a few of the other targets were filling up, while others had plenty of space. Watching it over a few days and the imbalance is getting worse.

      Confirmed that we are using default allocation settings:

      nbp7-mds1 ~ # lctl get_param lov.*.qos_* 
      lov.nbp7-MDT0000-mdtlov.qos_maxage=5 Sec
      lov.nbp7-MDT0000-mdtlov.qos_prio_free=91%
      lov.nbp7-MDT0000-mdtlov.qos_threshold_rr=17%

      Tests creating 100k new files of stripe count 1 showed that the more full OSTs are indeed getting allocated objects more often.

      This looks like it might be similar to LU-10823.

       

      Attachments

        Issue Links

          Activity

            [LU-11115] OST selection algorithm broken with max_create_count=0 or empty OSTs

            John L. Hammond (jhammond@whamcloud.com) merged in patch https://review.whamcloud.com/32859/
            Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: d2f9ed4a20b5fae836560efc607e443fa996c2e2

            gerrit Gerrit Updater added a comment - John L. Hammond (jhammond@whamcloud.com) merged in patch https://review.whamcloud.com/32859/ Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: d2f9ed4a20b5fae836560efc607e443fa996c2e2

            Peter, we need a nasa label on this ticket. Thanks.

            jaylan Jay Lan (Inactive) added a comment - Peter, we need a nasa label on this ticket. Thanks.
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32823/
            Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5b147e47de651f1c140f69314a2d6b56ff6b14d7

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32823/ Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5b147e47de651f1c140f69314a2d6b56ff6b14d7

            Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32859
            Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 7889706668c671cdadb8febfe819f7e475bdf257

            gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32859 Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 7889706668c671cdadb8febfe819f7e475bdf257
            yujian Jian Yu added a comment -

            Sure, Jay.

            yujian Jian Yu added a comment - Sure, Jay.

            Hi Jian Yu,

            Could you back port your patch to b2_10?

            There were conflicts. Thanks!

             

            Changes to be committed:

            modified: lustre/lod/lod_qos.c
            modified: lustre/osp/osp_precreate.c

            Unmerged paths:
            (use "git add/rm <file>..." as appropriate to mark resolution)

            deleted by us: lustre/include/uapi/linux/lustre/lustre_user.h
            both modified: lustre/lod/lod_object.c

            jaylan Jay Lan (Inactive) added a comment - Hi Jian Yu, Could you back port your patch to b2_10? There were conflicts. Thanks!   Changes to be committed: modified: lustre/lod/lod_qos.c modified: lustre/osp/osp_precreate.c Unmerged paths: (use "git add/rm <file>..." as appropriate to mark resolution) deleted by us: lustre/include/uapi/linux/lustre/lustre_user.h both modified: lustre/lod/lod_object.c

            Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32823
            Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9c390821eec3eea991b820544dce52fba3a73494

            gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32823 Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9c390821eec3eea991b820544dce52fba3a73494
            yujian Jian Yu added a comment -

            The issue can be reproduced.

            Lustre debug log on MDS shows that while choosing OST to create object, function lod_qos_prep_create() will try QoS algorithm first by calling lod_alloc_qos(). If free space is distributed evenly among OSTs, lod_alloc_qos() will return -EAGAIN, then lod_qos_prep_create() will call lod_alloc_rr() to use RR algorithm.

            Both in lod_alloc_qos() and lod_alloc_rr(), function lod_statfs_and_check() is used to check whether the OST target is available for new OST objects or not. However, OST target with max_create_count=0 is not checked in that function and just returned as an available OST.

            This issue affects lod_alloc_qos(), but not lod_alloc_rr() because the following extra codes are called in lod_check_and_reserve_ost() to check and skip OST target with max_create_count=0:

            lod_check_and_reserve_ost()
                    /*
                     * We expect number of precreated objects in f_ffree at
                     * the first iteration, skip OSPs with no objects ready
                     */
                    if (sfs->os_fprecreated == 0 && speed == 0) {
                            QOS_DEBUG("#%d: precreation is empty\n", ost_idx);
                            goto out_return;
                    }
            

            I'm creating a patch to fix lod_alloc_qos().

            yujian Jian Yu added a comment - The issue can be reproduced. Lustre debug log on MDS shows that while choosing OST to create object, function lod_qos_prep_create() will try QoS algorithm first by calling lod_alloc_qos(). If free space is distributed evenly among OSTs, lod_alloc_qos() will return -EAGAIN, then lod_qos_prep_create() will call lod_alloc_rr() to use RR algorithm. Both in lod_alloc_qos() and lod_alloc_rr(), function lod_statfs_and_check() is used to check whether the OST target is available for new OST objects or not. However, OST target with max_create_count=0 is not checked in that function and just returned as an available OST. This issue affects lod_alloc_qos(), but not lod_alloc_rr() because the following extra codes are called in lod_check_and_reserve_ost() to check and skip OST target with max_create_count=0: lod_check_and_reserve_ost() /* * We expect number of precreated objects in f_ffree at * the first iteration, skip OSPs with no objects ready */ if (sfs->os_fprecreated == 0 && speed == 0) { QOS_DEBUG( "#%d: precreation is empty\n" , ost_idx); goto out_return; } I'm creating a patch to fix lod_alloc_qos().
            yujian Jian Yu added a comment -

            Sure, Nathan. Let me reproduce and investigate further. Have a nice vacation!

            yujian Jian Yu added a comment - Sure, Nathan. Let me reproduce and investigate further. Have a nice vacation!

            People

              yujian Jian Yu
              ndauchy Nathan Dauchy (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: