[LU-6291] conf-sanity test_41a: failed to respond and timed out Created: 26/Feb/15  Updated: 09/Sep/16  Resolved: 31/Mar/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: dne2

Issue Links:
Related
is related to LU-3534 async update cross-MDTs Resolved
Severity: 3
Rank (Obsolete): 17624

 Description   

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/79052dac-bd31-11e4-8d85-5254006e85c2.

The sub-test test_41a failed with the following error:

test failed to respond and timed out

Please provide additional information about the failure here.

Info required for matching: conf-sanity 41a



 Comments   
Comment by Niu Yawei (Inactive) [ 12/Mar/15 ]

That's a test script problem, in the DNE environment, this test (conf-sanity 41a) starts only the master MDT, so client mount will always fail for -EAGAIN. See tgt_handle_connect() -> mdt_obd_connect() -> lod_obd_get_info().

I think we'd change the test script to replace the "start $SINGLEMDS" to "start_mds". Some other tests have the same problem, 41b for instance.

Comment by Niu Yawei (Inactive) [ 12/Mar/15 ]

The -EAGAIN error is introduced by:

commit daa691c4dcbc5a70acf4cf161c581b45c104c87a
Author: Wang Di <di.wang@intel.com>
Date:   Mon Aug 11 13:46:38 2014 -0700

    LU-3536 lod: write updates to update log

    For cross-MDT operation, LOD will write updates into the
    update log on all of MDTs.

    1. In transaction start, LOD perpare the update records
       buffer for cross-MDT operation.
    2. Sub LOD collects all updates in execution phase.
    3. In transaction stop, LOD will write thse updates as
       llog record on all of MDTs.

    Change-Id: Ibba79267393db00ba05e0aa2df9865f88149eaa4
    Signed-off-by: Wang Di <di.wang@intel.com>

Which means client won't be able to mount until all MDTs started, I'm not sure if it's by design or a defect. If it's a defect, we'd fix it rather than fix the test script.

Comment by Di Wang [ 31/Mar/15 ]

Yes, that is the rule, if the failover MDT can not connect to all of other MDTs, it has to wait until the administrator abort the recovery. I will fix the test script for now.

Generated at Sat Feb 10 01:58:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.