[LU-1972] Test failure on test suite replay-single, subtest test_53a Created: 18/Sep/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4218

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0b17713c-015a-11e2-bc4e-52540035b04c.

The sub-test test_53a failed with the following error:

test_53a failed with 2

Info required for matching: replay-single 53a



 Comments   
Comment by Li Wei (Inactive) [ 18/Sep/12 ]

It seems OBD_FAIL_MDS_CLOSE_NET is defined but never appears in any place in the code?

Comment by Ian Colle (Inactive) [ 28/Sep/12 ]

https://maloo.whamcloud.com/test_sets/b5b17ebc-0939-11e2-a95c-52540035b04c

Comment by Andreas Dilger [ 12/Oct/12 ]

Li Wei, it seems that OBD_FAIL_MDS_CLOSE_NET does exist, but it is defined in an extremely confusing manner:

#define DEF_HNDL(prefix, base, suffix, flags, opc, fn, fmt)             \
[prefix ## _ ## opc - prefix ## _ ## base] = {                          \
        .mh_name    = #opc,                                             \
        .mh_fail_id = OBD_FAIL_ ## prefix ## _  ## opc ## suffix,       \
        .mh_opc     = prefix ## _  ## opc,                              \
        .mh_flags   = flags,                                            \
        .mh_act     = fn,                                               \
        .mh_fmt     = fmt                                               \
}

#define DEF_MDT_HNDL_F(flags, name, fn)                                 \
        DEF_HNDL(MDS, GETATTR, _NET, flags, name, fn, &RQF_MDS_ ## name)

static struct mdt_handler mdt_mds_ops[] = {
        :
        :
DEF_MDT_HNDL_F(HABEO_CORPUS,              CLOSE,        mdt_close),

static int mdt_req_handle(struct mdt_thread_info *info,
                          struct mdt_handler *h, struct ptlrpc_request *req)
{
        :
        :

        /*
         * Checking for various OBD_FAIL_$PREF_$OPC_NET codes. _Do_ not try
         * to put same checks into handlers like mdt_close(), mdt_reint(),
         * etc., without talking to mdt authors first. Checking same thing
         * there again is useless and returning 0 error without packing reply
         * is buggy! Handlers either pack reply or return error.
         *
         * We return 0 here and do not send any reply in order to emulate
         * network failure. Do not send any reply in case any of NET related
         * fail_id has occured.
         */     
        if (OBD_FAIL_CHECK_ORSET(h->mh_fail_id, OBD_FAIL_ONCE))
                RETURN(0);

and the only reason that I found it was coincidentally because of mdt_close() in the above comment... Searching for OBD_FAIL_MDS_CLOSE_NET, MDS_CLOSE, or RQF_MDS_CLOSE produced nothing. I found the mdt_close() function and had started searching for that in the code when I found the comment.

Grrr, this makes the MDS code nearly impossible to follow (as if it wasn't already)...

I've submitted http://review.whamcloud.com/4260 to address the terrible coding style. This may make it easier to understand the code, but does nothing to actually fix the problem reported here.

Comment by Jian Yu [ 22/Dec/12 ]

Lustre Client: 2.1.4 RC2
Lustre Server: 2.2.0
Distro/Arch: RHEL6.3/x86_64

The same issue occurred: https://maloo.whamcloud.com/test_sets/16e08334-4bea-11e2-a817-52540035b04c

Comment by Sarah Liu [ 14/Jan/13 ]

another instance seen in 2.3.0 server vs 2.4 client:
https://maloo.whamcloud.com/test_sets/8cf3b9cc-5b55-11e2-8985-52540035b04c

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:21:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.