Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • None
    • 15050

    Description

      It would be good to have a variant of the existing OBD_FAIL_CHECK() macro that was designed specifically for randomized fault injection. It's possible that what I want can be accommodated by what we have now but I suspect that it can't. On issue is that not all OBD_FAIL_XXX locations are suitable for randomized fault injection. Because of this we cannot grep for OBD_FAIL_XXX from obd_support.h and inject them in turn during various workloads.

      Here's what I want:

      1. OBD_FAULT_CHECK() should accept existing fail locs.
      2. If OBD_FAIL_CHECK(loc) returns true then so should OBD_FAULT_CHECK(loc).
      3. If a given location is deemed good for randomized fault injection then we just replace OBD_FAIL_CHECK() with OBD_FAULT_CHECK() and we're good.
      4. OBD_FAULT_CHECK() should also be triggered by setting CFS_FAULT (0x02000000) in cfs_fail_loc. This allows randomly triggering any site that uses OBD_FAULT_CHECK().
      5. We should expect (enforce) that triggered OBD_FAULT_CHECKs be recovered from.

      It may be worthwhile to add a cfs_fail_err for use with OBD_{FAIL,FAULT}_CHECK().

      extern long cfs_fail_err;
      
      int dt_declare_bankruptcy(const struct lu_env *env, ...)
      {
              if (OBD_FAULT_CHECK(OBD_FAIL_DT_DECLARE_BANKRUPTCY))
                      RETURN(cfs_fail_err)
      
              ...
      }
      
      void *lu_alloc_gater(const struct lu_env *env, ...)
      {
              if (OBD_FAULT_CHECK(OBD_FAIL_LU_ALLOC_GATER))
                      RETURN(ERR_PTR(cfs_fail_err));
      
              ...
      }
      

      I welcome any suggestions here.

      Attachments

        Issue Links

          Activity

            [LU-5409] add OBD_FAULT_CHECK

            Reopening to add label

            jlevi Jodi Levi (Inactive) added a comment - Reopening to add label

            Never mind that. I had a copy of old obd_support.h lying in lustre's root.

            fzago Frank Zago (Inactive) added a comment - Never mind that. I had a copy of old obd_support.h lying in lustre's root.

            This new patch prevents compilation on centos 6.5:

            make[3]: Entering directory `/root/rpmbuild/BUILD/kernel-2.6.32.431.5.1.el6_lustre'
              CC [M]  /root/lustre-cleanup/lustre/osd-ldiskfs/osd_handler.o
            In file included from /root/lustre-cleanup/lustre/include/lu_target.h:40,
                             from /root/lustre-cleanup/lustre/include/obd.h:57,
                             from /root/lustre-cleanup/lustre/osd-ldiskfs/osd_internal.h:61,
                             from /root/lustre-cleanup/lustre/osd-ldiskfs/osd_handler.c:68:
            /root/lustre-cleanup/lustre/include/dt_object.h: In function ‘dt_declare_create’:
            /root/lustre-cleanup/lustre/include/dt_object.h:1071: error: ‘OBD_FAIL_DT_DECLARE_CREATE’ undeclared (first use in this function)
            /root/lustre-cleanup/lustre/include/dt_object.h:1071: error: (Each undeclared identifier is reported only once
            /root/lustre-cleanup/lustre/include/dt_object.h:1071: error: for each function it appears in.)
            /root/lustre-cleanup/lustre/include/dt_object.h: In function ‘dt_create’:
            /root/lustre-cleanup/lustre/include/dt_object.h:1088: error: ‘OBD_FAIL_DT_CREATE’ undeclared (first use in this function)
            .....
            

            Reverting the patch fixes the issue.

            fzago Frank Zago (Inactive) added a comment - This new patch prevents compilation on centos 6.5: make[3]: Entering directory `/root/rpmbuild/BUILD/kernel-2.6.32.431.5.1.el6_lustre' CC [M] /root/lustre-cleanup/lustre/osd-ldiskfs/osd_handler.o In file included from /root/lustre-cleanup/lustre/include/lu_target.h:40, from /root/lustre-cleanup/lustre/include/obd.h:57, from /root/lustre-cleanup/lustre/osd-ldiskfs/osd_internal.h:61, from /root/lustre-cleanup/lustre/osd-ldiskfs/osd_handler.c:68: /root/lustre-cleanup/lustre/include/dt_object.h: In function ‘dt_declare_create’: /root/lustre-cleanup/lustre/include/dt_object.h:1071: error: ‘OBD_FAIL_DT_DECLARE_CREATE’ undeclared (first use in this function) /root/lustre-cleanup/lustre/include/dt_object.h:1071: error: (Each undeclared identifier is reported only once /root/lustre-cleanup/lustre/include/dt_object.h:1071: error: for each function it appears in.) /root/lustre-cleanup/lustre/include/dt_object.h: In function ‘dt_create’: /root/lustre-cleanup/lustre/include/dt_object.h:1088: error: ‘OBD_FAIL_DT_CREATE’ undeclared (first use in this function) ..... Reverting the patch fixes the issue.
            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/#/c/11263/ .

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: