[LU-9450] precreate logic badness between lod_statfs_and_check() and lod_check_and_reserve_ost() Created: 04/May/17 Updated: 05/May/17 Resolved: 05/May/17 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Hammond | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
If sfs->os_state & OS_STATE_ENOINO && sfs->os_fprecreated == 0 is true in lod_statfs_and_check() then it returns -ENOSPC which causes early return from lod_check_and_reserve_ost(). So it seems like we never wake up the precreate thread and this becomes a permanent condition: static int lod_statfs_and_check(const struct lu_env *env, struct lod_device *d, int index, struct obd_statfs *sfs) { struct lod_tgt_desc *ost; int rc; ENTRY; LASSERT(d); ost = OST_TGT(d,index); LASSERT(ost); rc = dt_statfs(env, ost->ltd_ost, sfs); if (rc == 0 && ((sfs->os_state & OS_STATE_ENOSPC) || (sfs->os_state & OS_STATE_ENOINO && sfs->os_fprecreated == 0))) RETURN(-ENOSPC); ... } static int lod_check_and_reserve_ost(const struct lu_env *env, struct lod_device *m, struct obd_statfs *sfs, __u32 ost_idx, __u32 speed, __u32 *s_idx, struct dt_object **stripe, struct thandle *th, struct ost_pool *inuse) { struct dt_object *o; __u32 stripe_idx = *s_idx; int rc; rc = lod_statfs_and_check(env, m, ost_idx, sfs); if (rc) { /* this OSP doesn't feel well */ goto out_return; } /* * We expect number of precreated objects in f_ffree at * the first iteration, skip OSPs with no objects ready */ if (sfs->os_fprecreated == 0 && speed == 0) { QOS_DEBUG("#%d: precreation is empty\n", ost_idx); goto out_return; } ... } |
| Comments |
| Comment by John Hammond [ 04/May/17 ] |
|
Alex, could you take a look and confirm my reasoning? |
| Comment by Alex Zhuravlev [ 04/May/17 ] |
|
hmm, OSP should be doing precreation on its own generally. the same should apply to os_state. do you have a specific case/test failure? |
| Comment by John Hammond [ 04/May/17 ] |
|
When I created this, I missed some calls to osp_pre_update_status(). |
| Comment by Andreas Dilger [ 04/May/17 ] |
|
It looks like this is causing the LU-9096 failures. |
| Comment by Peter Jones [ 04/May/17 ] |
|
So should we close LU-9096 as a duplicate of this one? It seems like there is more analysis here... |
| Comment by Alex Zhuravlev [ 05/May/17 ] |
|
I think LU-9096 is rather an issue in the test. |