[LU-9450] precreate logic badness between lod_statfs_and_check() and lod_check_and_reserve_ost() Created: 04/May/17  Updated: 05/May/17  Resolved: 05/May/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-9096 sanity test_253: File creation failed... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

If sfs->os_state & OS_STATE_ENOINO && sfs->os_fprecreated == 0 is true in lod_statfs_and_check() then it returns -ENOSPC which causes early return from lod_check_and_reserve_ost(). So it seems like we never wake up the precreate thread and this becomes a permanent condition:

static int lod_statfs_and_check(const struct lu_env *env, struct lod_device *d,
                                int index, struct obd_statfs *sfs)
{
        struct lod_tgt_desc *ost;
        int                  rc;
        ENTRY;

        LASSERT(d);
        ost = OST_TGT(d,index);
        LASSERT(ost);

        rc = dt_statfs(env, ost->ltd_ost, sfs);

        if (rc == 0 && ((sfs->os_state & OS_STATE_ENOSPC) ||
            (sfs->os_state & OS_STATE_ENOINO && sfs->os_fprecreated == 0)))
                RETURN(-ENOSPC);

        ...
}

static int lod_check_and_reserve_ost(const struct lu_env *env,
                                     struct lod_device *m,
                                     struct obd_statfs *sfs, __u32 ost_idx,
                                     __u32 speed, __u32 *s_idx,
                                     struct dt_object **stripe,
                                     struct thandle *th,
                                     struct ost_pool *inuse)
{
        struct dt_object   *o;
        __u32 stripe_idx = *s_idx;
        int rc;

        rc = lod_statfs_and_check(env, m, ost_idx, sfs);
        if (rc) {
                /* this OSP doesn't feel well */
                goto out_return;
        }

        /*
         * We expect number of precreated objects in f_ffree at
         * the first iteration, skip OSPs with no objects ready
         */
        if (sfs->os_fprecreated == 0 && speed == 0) {
                QOS_DEBUG("#%d: precreation is empty\n", ost_idx);
                goto out_return;
        }

        ...
}


 Comments   
Comment by John Hammond [ 04/May/17 ]

Alex, could you take a look and confirm my reasoning?

Comment by Alex Zhuravlev [ 04/May/17 ]

hmm, OSP should be doing precreation on its own generally. the same should apply to os_state. do you have a specific case/test failure?

Comment by John Hammond [ 04/May/17 ]

When I created this, I missed some calls to osp_pre_update_status().

Comment by Andreas Dilger [ 04/May/17 ]

It looks like this is causing the LU-9096 failures.

Comment by Peter Jones [ 04/May/17 ]

So should we close LU-9096 as a duplicate of this one? It seems like there is more analysis here...

Comment by Alex Zhuravlev [ 05/May/17 ]

I think LU-9096 is rather an issue in the test.

Generated at Sat Feb 10 02:26:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.