[LU-6291] conf-sanity test_41a: failed to respond and timed out Created: 26/Feb/15 Updated: 09/Sep/16 Resolved: 31/Mar/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne2 | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 17624 | ||||||||
| Description |
|
This issue was created by maloo for wangdi <di.wang@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/79052dac-bd31-11e4-8d85-5254006e85c2. The sub-test test_41a failed with the following error: test failed to respond and timed out Please provide additional information about the failure here. Info required for matching: conf-sanity 41a |
| Comments |
| Comment by Niu Yawei (Inactive) [ 12/Mar/15 ] |
|
That's a test script problem, in the DNE environment, this test (conf-sanity 41a) starts only the master MDT, so client mount will always fail for -EAGAIN. See tgt_handle_connect() -> mdt_obd_connect() -> lod_obd_get_info(). I think we'd change the test script to replace the "start $SINGLEMDS" to "start_mds". Some other tests have the same problem, 41b for instance. |
| Comment by Niu Yawei (Inactive) [ 12/Mar/15 ] |
|
The -EAGAIN error is introduced by: commit daa691c4dcbc5a70acf4cf161c581b45c104c87a
Author: Wang Di <di.wang@intel.com>
Date: Mon Aug 11 13:46:38 2014 -0700
LU-3536 lod: write updates to update log
For cross-MDT operation, LOD will write updates into the
update log on all of MDTs.
1. In transaction start, LOD perpare the update records
buffer for cross-MDT operation.
2. Sub LOD collects all updates in execution phase.
3. In transaction stop, LOD will write thse updates as
llog record on all of MDTs.
Change-Id: Ibba79267393db00ba05e0aa2df9865f88149eaa4
Signed-off-by: Wang Di <di.wang@intel.com>
Which means client won't be able to mount until all MDTs started, I'm not sure if it's by design or a defect. If it's a defect, we'd fix it rather than fix the test script. |
| Comment by Di Wang [ 31/Mar/15 ] |
|
Yes, that is the rule, if the failover MDT can not connect to all of other MDTs, it has to wait until the administrator abort the recovery. I will fix the test script for now. |