[LU-4433] mds-survey test_1: FAIL: OST lustre-MDT0001 not setup Created: 04/Jan/14 Updated: 20/Nov/17 Resolved: 22/Aug/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.5.3, Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne | ||
| Environment: |
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/5/ |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 12177 | ||||||||||||
| Description |
|
While running mds-survey in DNE mode, test 1 failed as follows: == mds-survey test 1: Metadata survey with zero-stripe == 18:52:57 (1388717577) CMD: client-7vm3 lctl dl + file_count=102962 thrlo=1 thrhi=8 dir_count=4 layer=mdd stripe_count=0 rslt_loc=/tmp targets="client-7vm3:lustre-MDT0000 lustre-MDT0001" /usr/bin/mds-survey Warning: Permanently added 'client-7vm3,10.10.4.236' (RSA) to the list of known hosts. OST lustre-MDT0001 not setup procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 0 1279000 126456 355052 0 0 6 25 490 332 1 6 90 2 1 mds-survey test_1: @@@@@@ FAIL: mds-survey failed Maloo report: https://maloo.whamcloud.com/test_sets/64322640-749a-11e3-8b21-52540035b04c |
| Comments |
| Comment by Jian Yu [ 04/Jan/14 ] |
|
In DNE mode, mds-survey test 2 failed as follows: == mds-survey test 2: Metadata survey with stripe_count = 1 == 18:53:03 (1388717583) CMD: client-7vm3 lctl dl + file_count=102962 thrlo=1 thrhi=8 dir_count=4 layer=mdd stripe_count=1 rslt_loc=/tmp targets="client-7vm3:lustre-MDT0000 lustre-MDT0001" /usr/bin/mds-survey Need obdfilter to test stripe_count cat: /tmp/mds_survey*: No such file or directory mds-survey test_2: @@@@@@ FAIL: mds-survey failed Maloo report: https://maloo.whamcloud.com/test_sets/64322640-749a-11e3-8b21-52540035b04c |
| Comment by Jian Yu [ 07/Mar/14 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_5/39/ (2.5.1 RC1) The same failure occurred: |
| Comment by Sarah Liu [ 26/Jul/14 ] |
|
Hit this in b2_6-rc2 DNE testing client and server: lustre-b2_6-rc2 1 MDS with 2 MDTs https://testing.hpdd.intel.com/test_sets/b99b1d4e-14ef-11e4-bb6a-5254006e85c2 == mds-survey test 1: Metadata survey with zero-stripe == 22:17:04 (1406351824) CMD: onyx-46vm7 lctl dl + file_count=102960 thrlo=1 thrhi=8 dir_count=4 layer=mdd stripe_count=0 rslt_loc=/tmp targets="onyx-46vm7:lustre-MDT0000 lustre-MDT0001" /usr/bin/mds-survey Warning: Permanently added 'onyx-46vm7' (RSA) to the list of known hosts. OST lustre-MDT0001 not setup procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 1386232 136280 238416 0 0 14 114 1219 687 1 10 85 4 0 mds-survey test_1: @@@@@@ FAIL: mds-survey failed |
| Comment by Jian Yu [ 31/Aug/14 ] |
|
Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/86/ (2.5.3 RC1) The same failure occurred: https://testing.hpdd.intel.com/test_sets/3a8c8fc2-308a-11e4-a3d9-5254006e85c2 |
| Comment by Sarah Liu [ 14/Mar/16 ] |
|
same issue hit on master/#3324 DNE mode |
| Comment by Jian Yu [ 11/Apr/16 ] |
|
Hi Di, Does /usr/bin/mds-survey support multiple MDT targets? While running the following command on master branch: file_count=117549 thrlo=1 thrhi=8 dir_count=4 layer=mdd stripe_count=0 rslt_loc=/tmp targets="eagle-36vm6:lustre-MDT0000 eagle-36vm6:lustre-MDT0002 eagle-37vm5:lustre-MDT0001 eagle-37vm5:lustre-MDT0003" /usr/bin/mds-survey I hit the following errors: Mon Apr 11 01:28:35 UTC 2016 /usr/bin/mds-survey from eagle-30vm6 error: test_mkdir: Object is remote ERROR: fail test_mkdir created directories on eagle-36vm6:lustre-MDT0002_ecc failed program exited with error =======> Create 4 directories on eagle-36vm6:lustre-MDT0000_ecc =======> Create 4 directories on eagle-36vm6:lustre-MDT0002_ecc error: test_mkdir: Object is remote Mon Apr 11 01:28:35 UTC 2016 /usr/bin/mds-survey from eagle-30vm6 created directories on eagle-36vm6:lustre-MDT0002_ecc failed |
| Comment by Di Wang [ 11/Apr/16 ] |
|
Hmm, if I remember correctly, b2_5 should support mds_survey on multiple MDT. Master definitely does. Maybe there is a bug in 2_5. |
| Comment by Jian Yu [ 11/Apr/16 ] |
|
Hi Di, |
| Comment by Gerrit Updater [ 11/Apr/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/19437 |
| Comment by Jian Yu [ 11/Apr/16 ] |
|
Hi Di, The above patch just fixes mds-survey.sh test script to support multiple MDTs. But for mds-survey, it still failed with multiple MDTs. |
| Comment by Di Wang [ 12/Apr/16 ] |
|
Could you please re-run with debug-level = -1? thanks. |
| Comment by Jian Yu [ 13/Apr/16 ] |
|
Hi Di, Here is the report with debug=-1 and debug_mb=150: |
| Comment by Di Wang [ 15/Apr/16 ] |
|
According to the debug log, creation fails in lod_declare_object_create() 00000004:00000001:0.0:1460512310.071969:0:15039:0:(lod_object.c:3470:lod_declare_object_create()) Process leaving via out (rc=18446744073709551550 : -66 : 0xffffffffffffffbe) 00000004:00000001:0.0:1460512310.071972:0:15039:0:(lod_object.c:3508:lod_declare_object_create()) Process leaving (rc=18446744073709551550 : -66 : ffffffffffffffbe) 00000004:00000001:0.0:1460512310.071974:0:15039:0:(mdd_object.c:374:mdd_declare_object_create_internal()) Process leaving (rc=18446744073709551550 : -66 : ffffffffffffffbe) So in lod_declare_object_create() static int lod_declare_object_create(const struct lu_env *env,
struct dt_object *dt,
struct lu_attr *attr,
struct dt_allocation_hint *hint,
struct dt_object_format *dof,
struct thandle *th)
{
...........
/* If the parent has default stripeEA, and client
* did not find it before sending create request,
* then MDT will return -EREMOTE, and client will
* retrieve the default stripeEA and re-create the
* sub directory.
*
* Note: if dah_eadata != NULL, it means creating the
* striped directory with specified stripeEA, then it
* should ignore the default stripeEA */
if (hint != NULL && hint->dah_eadata == NULL) {
if (OBD_FAIL_CHECK(OBD_FAIL_MDS_STALE_DIR_LAYOUT))
GOTO(out, rc = -EREMOTE);
........
}
If the create request does not have EA, and also the parent does not have default stripe EA, then the new created child has to be in the same MDT with its parent, otherwise it will return -EREMOTE, see So there are two options to fix the problem 1. If echo client want to create a remote directory, then add lum into the creation spec, see echo_create_md_object(). Either way is fine to me. |
| Comment by Peter Jones [ 05/Aug/16 ] |
|
Reassigning to Lai |
| Comment by nasf (Inactive) [ 15/Aug/16 ] |
|
Take it as Peter suggested. |
| Comment by nasf (Inactive) [ 15/Aug/16 ] |
|
Updated the patch http://review.whamcloud.com/#/c/19437/. Hope it can pass Maloo test. |
| Comment by Gerrit Updater [ 22/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19437/ |
| Comment by nasf (Inactive) [ 22/Aug/16 ] |
|
The patch has been landed to master. |
| Comment by Gerrit Updater [ 13/Oct/16 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/23146 |
| Comment by James A Simmons [ 13/Oct/16 ] |
|
Sorry mistyped |