Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9450

precreate logic badness between lod_statfs_and_check() and lod_check_and_reserve_ost()

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • Lustre 2.10.0
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      If sfs->os_state & OS_STATE_ENOINO && sfs->os_fprecreated == 0 is true in lod_statfs_and_check() then it returns -ENOSPC which causes early return from lod_check_and_reserve_ost(). So it seems like we never wake up the precreate thread and this becomes a permanent condition:

      static int lod_statfs_and_check(const struct lu_env *env, struct lod_device *d,
                                      int index, struct obd_statfs *sfs)
      {
              struct lod_tgt_desc *ost;
              int                  rc;
              ENTRY;
      
              LASSERT(d);
              ost = OST_TGT(d,index);
              LASSERT(ost);
      
              rc = dt_statfs(env, ost->ltd_ost, sfs);
      
              if (rc == 0 && ((sfs->os_state & OS_STATE_ENOSPC) ||
                  (sfs->os_state & OS_STATE_ENOINO && sfs->os_fprecreated == 0)))
                      RETURN(-ENOSPC);
      
              ...
      }
      
      static int lod_check_and_reserve_ost(const struct lu_env *env,
                                           struct lod_device *m,
                                           struct obd_statfs *sfs, __u32 ost_idx,
                                           __u32 speed, __u32 *s_idx,
                                           struct dt_object **stripe,
                                           struct thandle *th,
                                           struct ost_pool *inuse)
      {
              struct dt_object   *o;
              __u32 stripe_idx = *s_idx;
              int rc;
      
              rc = lod_statfs_and_check(env, m, ost_idx, sfs);
              if (rc) {
                      /* this OSP doesn't feel well */
                      goto out_return;
              }
      
              /*
               * We expect number of precreated objects in f_ffree at
               * the first iteration, skip OSPs with no objects ready
               */
              if (sfs->os_fprecreated == 0 && speed == 0) {
                      QOS_DEBUG("#%d: precreation is empty\n", ost_idx);
                      goto out_return;
              }
      
              ...
      }
      

      Attachments

        Issue Links

          Activity

            [LU-9450] precreate logic badness between lod_statfs_and_check() and lod_check_and_reserve_ost()

            I think LU-9096 is rather an issue in the test.

            bzzz Alex Zhuravlev added a comment - I think LU-9096 is rather an issue in the test.
            pjones Peter Jones added a comment -

            So should we close LU-9096 as a duplicate of this one? It seems like there is more analysis here...

            pjones Peter Jones added a comment - So should we close LU-9096 as a duplicate of this one? It seems like there is more analysis here...

            It looks like this is causing the LU-9096 failures.

            adilger Andreas Dilger added a comment - It looks like this is causing the LU-9096 failures.
            jhammond John Hammond added a comment -

            When I created this, I missed some calls to osp_pre_update_status().

            jhammond John Hammond added a comment - When I created this, I missed some calls to osp_pre_update_status() .

            hmm, OSP should be doing precreation on its own generally. the same should apply to os_state. do you have a specific case/test failure?

            bzzz Alex Zhuravlev added a comment - hmm, OSP should be doing precreation on its own generally. the same should apply to os_state. do you have a specific case/test failure?
            jhammond John Hammond added a comment -

            Alex, could you take a look and confirm my reasoning?

            jhammond John Hammond added a comment - Alex, could you take a look and confirm my reasoning?

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: