[LU-11115] OST selection algorithm broken with max_create_count=0 or empty OSTs - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.12.0, Lustre 2.10.5
Affects Version/s: Lustre 2.7.0, Lustre 2.10.0
Labels:
None
Environment:
Server running:
CentOS-6 lcentos-release-6-9.el6.12.3.x86_64
lustre-2.7.3-1nasS_mofed33v3g_2.6.32_642.15.1.el6.20170609.x86_64.lustre273.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We have blocked new object creation to some of our OSTs with commands like:

lctl set_param osp.$OSTNAME.max_create_count=0

This is to drain data off of storage to be repurposed as spares. Three targets are already at 0%, and confirmed to have no remaining objects with e2scan and lester. 11 other targets are blocked and data is being migrated off.

Noticed that a few of the other targets were filling up, while others had plenty of space. Watching it over a few days and the imbalance is getting worse.

Confirmed that we are using default allocation settings:

nbp7-mds1 ~ # lctl get_param lov.*.qos_*
lov.nbp7-MDT0000-mdtlov.qos_maxage=5 Sec
lov.nbp7-MDT0000-mdtlov.qos_prio_free=91%
lov.nbp7-MDT0000-mdtlov.qos_threshold_rr=17%

Tests creating 100k new files of stripe count 1 showed that the more full OSTs are indeed getting allocated objects more often.

This looks like it might be similar to ~~LU-10823~~.

Attachments

Issue Links

is related to

LU-11605 create_count stuck in 0 after changeing max_create_count to 0 and back 20 000

Resolved

is related to

LU-4825 lfs migrate not freeing space on OST

Resolved

LU-10823 max_create_count triggering uneven distribution across OSTs

Resolved

Activity

[LU-11115] OST selection algorithm broken with max_create_count=0 or empty OSTs

Gerrit Updater added a comment - 02/Aug/18 7:25 PM

John L. Hammond (jhammond@whamcloud.com) merged in patch https://review.whamcloud.com/32859/
Subject: ~~LU-11115~~ lod: skip max_create_count=0 OST in QoS and RR algorithms
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: d2f9ed4a20b5fae836560efc607e443fa996c2e2

Gerrit Updater added a comment - 02/Aug/18 7:25 PM John L. Hammond (jhammond@whamcloud.com) merged in patch https://review.whamcloud.com/32859/ Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: d2f9ed4a20b5fae836560efc607e443fa996c2e2

Jay Lan (Inactive) added a comment - 01/Aug/18 6:20 PM

Peter, we need a nasa label on this ticket. Thanks.

Jay Lan (Inactive) added a comment - 01/Aug/18 6:20 PM Peter, we need a nasa label on this ticket. Thanks.

Peter Jones added a comment - 30/Jul/18 10:55 PM

Landed for 2.12

Peter Jones added a comment - 30/Jul/18 10:55 PM Landed for 2.12

Gerrit Updater added a comment - 30/Jul/18 10:24 PM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32823/
Subject: ~~LU-11115~~ lod: skip max_create_count=0 OST in QoS and RR algorithms
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5b147e47de651f1c140f69314a2d6b56ff6b14d7

Gerrit Updater added a comment - 30/Jul/18 10:24 PM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32823/ Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5b147e47de651f1c140f69314a2d6b56ff6b14d7

Gerrit Updater added a comment - 23/Jul/18 10:01 PM

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32859
Subject: ~~LU-11115~~ lod: skip max_create_count=0 OST in QoS and RR algorithms
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 7889706668c671cdadb8febfe819f7e475bdf257

Gerrit Updater added a comment - 23/Jul/18 10:01 PM Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32859 Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 7889706668c671cdadb8febfe819f7e475bdf257

Jian Yu added a comment - 23/Jul/18 8:22 PM

Sure, Jay.

Jian Yu added a comment - 23/Jul/18 8:22 PM Sure, Jay.

Jay Lan (Inactive) added a comment - 23/Jul/18 7:07 PM

Hi Jian Yu,

Could you back port your patch to b2_10?

There were conflicts. Thanks!

Changes to be committed:

modified: lustre/lod/lod_qos.c
modified: lustre/osp/osp_precreate.c

Unmerged paths:
(use "git add/rm <file>..." as appropriate to mark resolution)

deleted by us: lustre/include/uapi/linux/lustre/lustre_user.h
both modified: lustre/lod/lod_object.c

Jay Lan (Inactive) added a comment - 23/Jul/18 7:07 PM Hi Jian Yu, Could you back port your patch to b2_10? There were conflicts. Thanks! Changes to be committed: modified: lustre/lod/lod_qos.c modified: lustre/osp/osp_precreate.c Unmerged paths: (use "git add/rm <file>..." as appropriate to mark resolution) deleted by us: lustre/include/uapi/linux/lustre/lustre_user.h both modified: lustre/lod/lod_object.c

Gerrit Updater added a comment - 17/Jul/18 12:12 AM

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32823
Subject: ~~LU-11115~~ lod: skip max_create_count=0 OST in QoS and RR algorithms
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9c390821eec3eea991b820544dce52fba3a73494

Gerrit Updater added a comment - 17/Jul/18 12:12 AM Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32823 Subject: LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9c390821eec3eea991b820544dce52fba3a73494

Jian Yu added a comment - 12/Jul/18 7:14 AM

The issue can be reproduced.

Lustre debug log on MDS shows that while choosing OST to create object, function lod_qos_prep_create() will try QoS algorithm first by calling lod_alloc_qos(). If free space is distributed evenly among OSTs, lod_alloc_qos() will return -EAGAIN, then lod_qos_prep_create() will call lod_alloc_rr() to use RR algorithm.

Both in lod_alloc_qos() and lod_alloc_rr(), function lod_statfs_and_check() is used to check whether the OST target is available for new OST objects or not. However, OST target with max_create_count=0 is not checked in that function and just returned as an available OST.

This issue affects lod_alloc_qos(), but not lod_alloc_rr() because the following extra codes are called in lod_check_and_reserve_ost() to check and skip OST target with max_create_count=0:

lod_check_and_reserve_ost()

        /*
         * We expect number of precreated objects in f_ffree at
         * the first iteration, skip OSPs with no objects ready
         */
        if (sfs->os_fprecreated == 0 && speed == 0) {
                QOS_DEBUG("#%d: precreation is empty\n", ost_idx);
                goto out_return;
        }

I'm creating a patch to fix lod_alloc_qos().

Jian Yu added a comment - 12/Jul/18 7:14 AM The issue can be reproduced. Lustre debug log on MDS shows that while choosing OST to create object, function lod_qos_prep_create() will try QoS algorithm first by calling lod_alloc_qos(). If free space is distributed evenly among OSTs, lod_alloc_qos() will return -EAGAIN, then lod_qos_prep_create() will call lod_alloc_rr() to use RR algorithm. Both in lod_alloc_qos() and lod_alloc_rr(), function lod_statfs_and_check() is used to check whether the OST target is available for new OST objects or not. However, OST target with max_create_count=0 is not checked in that function and just returned as an available OST. This issue affects lod_alloc_qos(), but not lod_alloc_rr() because the following extra codes are called in lod_check_and_reserve_ost() to check and skip OST target with max_create_count=0: lod_check_and_reserve_ost() /* * We expect number of precreated objects in f_ffree at * the first iteration, skip OSPs with no objects ready */ if (sfs->os_fprecreated == 0 && speed == 0) { QOS_DEBUG( "#%d: precreation is empty\n" , ost_idx); goto out_return; } I'm creating a patch to fix lod_alloc_qos().

Jian Yu added a comment - 05/Jul/18 9:21 PM

Sure, Nathan. Let me reproduce and investigate further. Have a nice vacation!

Jian Yu added a comment - 05/Jul/18 9:21 PM Sure, Nathan. Let me reproduce and investigate further. Have a nice vacation!

People

Assignee:: Jian Yu

Reporter:: Nathan Dauchy (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 03/Jul/18 10:43 PM

Updated:: 15/Apr/19 7:32 PM

Resolved:: 30/Jul/18 10:55 PM