[LU-5409] add OBD_FAULT_CHECK Created: 24/Jul/14  Updated: 15/Jul/15  Resolved: 27/Apr/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.7.0

Type: Improvement Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: fault

Rank (Obsolete): 15050

 Description   

It would be good to have a variant of the existing OBD_FAIL_CHECK() macro that was designed specifically for randomized fault injection. It's possible that what I want can be accommodated by what we have now but I suspect that it can't. On issue is that not all OBD_FAIL_XXX locations are suitable for randomized fault injection. Because of this we cannot grep for OBD_FAIL_XXX from obd_support.h and inject them in turn during various workloads.

Here's what I want:

  1. OBD_FAULT_CHECK() should accept existing fail locs.
  2. If OBD_FAIL_CHECK(loc) returns true then so should OBD_FAULT_CHECK(loc).
  3. If a given location is deemed good for randomized fault injection then we just replace OBD_FAIL_CHECK() with OBD_FAULT_CHECK() and we're good.
  4. OBD_FAULT_CHECK() should also be triggered by setting CFS_FAULT (0x02000000) in cfs_fail_loc. This allows randomly triggering any site that uses OBD_FAULT_CHECK().
  5. We should expect (enforce) that triggered OBD_FAULT_CHECKs be recovered from.

It may be worthwhile to add a cfs_fail_err for use with OBD_{FAIL,FAULT}_CHECK().

extern long cfs_fail_err;

int dt_declare_bankruptcy(const struct lu_env *env, ...)
{
        if (OBD_FAULT_CHECK(OBD_FAIL_DT_DECLARE_BANKRUPTCY))
                RETURN(cfs_fail_err)

        ...
}

void *lu_alloc_gater(const struct lu_env *env, ...)
{
        if (OBD_FAULT_CHECK(OBD_FAIL_LU_ALLOC_GATER))
                RETURN(ERR_PTR(cfs_fail_err));

        ...
}

I welcome any suggestions here.



 Comments   
Comment by John Hammond [ 29/Jul/14 ]

Please see http://review.whamcloud.com/#/c/11263/.

Comment by Frank Zago (Inactive) [ 07/Aug/14 ]

This new patch prevents compilation on centos 6.5:

make[3]: Entering directory `/root/rpmbuild/BUILD/kernel-2.6.32.431.5.1.el6_lustre'
  CC [M]  /root/lustre-cleanup/lustre/osd-ldiskfs/osd_handler.o
In file included from /root/lustre-cleanup/lustre/include/lu_target.h:40,
                 from /root/lustre-cleanup/lustre/include/obd.h:57,
                 from /root/lustre-cleanup/lustre/osd-ldiskfs/osd_internal.h:61,
                 from /root/lustre-cleanup/lustre/osd-ldiskfs/osd_handler.c:68:
/root/lustre-cleanup/lustre/include/dt_object.h: In function ‘dt_declare_create’:
/root/lustre-cleanup/lustre/include/dt_object.h:1071: error: ‘OBD_FAIL_DT_DECLARE_CREATE’ undeclared (first use in this function)
/root/lustre-cleanup/lustre/include/dt_object.h:1071: error: (Each undeclared identifier is reported only once
/root/lustre-cleanup/lustre/include/dt_object.h:1071: error: for each function it appears in.)
/root/lustre-cleanup/lustre/include/dt_object.h: In function ‘dt_create’:
/root/lustre-cleanup/lustre/include/dt_object.h:1088: error: ‘OBD_FAIL_DT_CREATE’ undeclared (first use in this function)
.....

Reverting the patch fixes the issue.

Comment by Frank Zago (Inactive) [ 07/Aug/14 ]

Never mind that. I had a copy of old obd_support.h lying in lustre's root.

Comment by Jodi Levi (Inactive) [ 20/Aug/14 ]

Reopening to add label

Generated at Sat Feb 10 01:51:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.