Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4433

mds-survey test_1: FAIL: OST lustre-MDT0001 not setup

Details

    • 3
    • 12177

    Description

      While running mds-survey in DNE mode, test 1 failed as follows:

      == mds-survey test 1: Metadata survey with zero-stripe == 18:52:57 (1388717577)
      CMD: client-7vm3 lctl dl
      + file_count=102962 thrlo=1 thrhi=8 dir_count=4 layer=mdd stripe_count=0 rslt_loc=/tmp targets="client-7vm3:lustre-MDT0000 lustre-MDT0001" /usr/bin/mds-survey
      Warning: Permanently added 'client-7vm3,10.10.4.236' (RSA) to the list of known hosts.
      OST lustre-MDT0001 not setup
      procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
       r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
       3  0      0 1279000 126456 355052    0    0     6    25  490  332  1  6 90  2  1	
       mds-survey test_1: @@@@@@ FAIL: mds-survey failed
      

      Maloo report: https://maloo.whamcloud.com/test_sets/64322640-749a-11e3-8b21-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4433] mds-survey test_1: FAIL: OST lustre-MDT0001 not setup

            Sorry mistyped

            simmonsja James A Simmons added a comment - Sorry mistyped

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/23146
            Subject: LU-4433 obd: use ktime_t for calculating elapsed time
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c52ac644ab7bf1d3d9fe3fac5918ca2d2d584b80

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/23146 Subject: LU-4433 obd: use ktime_t for calculating elapsed time Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c52ac644ab7bf1d3d9fe3fac5918ca2d2d584b80

            The patch has been landed to master.

            yong.fan nasf (Inactive) added a comment - The patch has been landed to master.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19437/
            Subject: LU-4433 tests: fix mds-survey.sh to support multiple MDTs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 530b16b187de9a27723205d9d759f260bfd350b8

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19437/ Subject: LU-4433 tests: fix mds-survey.sh to support multiple MDTs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 530b16b187de9a27723205d9d759f260bfd350b8

            Updated the patch http://review.whamcloud.com/#/c/19437/. Hope it can pass Maloo test.

            yong.fan nasf (Inactive) added a comment - Updated the patch http://review.whamcloud.com/#/c/19437/ . Hope it can pass Maloo test.

            Take it as Peter suggested.

            yong.fan nasf (Inactive) added a comment - Take it as Peter suggested.
            pjones Peter Jones added a comment -

            Reassigning to Lai

            pjones Peter Jones added a comment - Reassigning to Lai
            di.wang Di Wang added a comment -

            According to the debug log, creation fails in lod_declare_object_create()

            00000004:00000001:0.0:1460512310.071969:0:15039:0:(lod_object.c:3470:lod_declare_object_create()) Process leaving via out (rc=18446744073709551550 : -66 : 0xffffffffffffffbe)
            00000004:00000001:0.0:1460512310.071972:0:15039:0:(lod_object.c:3508:lod_declare_object_create()) Process leaving (rc=18446744073709551550 : -66 : ffffffffffffffbe)
            00000004:00000001:0.0:1460512310.071974:0:15039:0:(mdd_object.c:374:mdd_declare_object_create_internal()) Process leaving (rc=18446744073709551550 : -66 : ffffffffffffffbe)
            

            So in lod_declare_object_create()

            static int lod_declare_object_create(const struct lu_env *env,
                                                 struct dt_object *dt,
                                                 struct lu_attr *attr,
                                                 struct dt_allocation_hint *hint,
                                                 struct dt_object_format *dof,
                                                 struct thandle *th)
            {
            ...........
            
                            /* If the parent has default stripeEA, and client
                             * did not find it before sending create request,
                             * then MDT will return -EREMOTE, and client will
                             * retrieve the default stripeEA and re-create the
                             * sub directory.
                             *
                             * Note: if dah_eadata != NULL, it means creating the
                             * striped directory with specified stripeEA, then it
                             * should ignore the default stripeEA */
                            if (hint != NULL && hint->dah_eadata == NULL) {
                                    if (OBD_FAIL_CHECK(OBD_FAIL_MDS_STALE_DIR_LAYOUT))
                                            GOTO(out, rc = -EREMOTE);
            
                   ........   
            }
            

            If the create request does not have EA, and also the parent does not have default stripe EA, then the new created child has to be in the same MDT with its parent, otherwise it will return -EREMOTE, see LU-6341 (http://review.whamcloud.com/13990). Unfortunately, this did not consider echo client.

            So there are two options to fix the problem

            1. If echo client want to create a remote directory, then add lum into the creation spec, see echo_create_md_object().
            2. Or add some flags in spec and hint, so let lod_declare_object_create() skip the this check for echo client.

            Either way is fine to me.

            di.wang Di Wang added a comment - According to the debug log, creation fails in lod_declare_object_create() 00000004:00000001:0.0:1460512310.071969:0:15039:0:(lod_object.c:3470:lod_declare_object_create()) Process leaving via out (rc=18446744073709551550 : -66 : 0xffffffffffffffbe) 00000004:00000001:0.0:1460512310.071972:0:15039:0:(lod_object.c:3508:lod_declare_object_create()) Process leaving (rc=18446744073709551550 : -66 : ffffffffffffffbe) 00000004:00000001:0.0:1460512310.071974:0:15039:0:(mdd_object.c:374:mdd_declare_object_create_internal()) Process leaving (rc=18446744073709551550 : -66 : ffffffffffffffbe) So in lod_declare_object_create() static int lod_declare_object_create(const struct lu_env *env, struct dt_object *dt, struct lu_attr *attr, struct dt_allocation_hint *hint, struct dt_object_format *dof, struct thandle *th) { ........... /* If the parent has default stripeEA, and client * did not find it before sending create request, * then MDT will return -EREMOTE, and client will * retrieve the default stripeEA and re-create the * sub directory. * * Note: if dah_eadata != NULL, it means creating the * striped directory with specified stripeEA, then it * should ignore the default stripeEA */ if (hint != NULL && hint->dah_eadata == NULL) { if (OBD_FAIL_CHECK(OBD_FAIL_MDS_STALE_DIR_LAYOUT)) GOTO(out, rc = -EREMOTE); ........ } If the create request does not have EA, and also the parent does not have default stripe EA, then the new created child has to be in the same MDT with its parent, otherwise it will return -EREMOTE, see LU-6341 ( http://review.whamcloud.com/13990 ). Unfortunately, this did not consider echo client. So there are two options to fix the problem 1. If echo client want to create a remote directory, then add lum into the creation spec, see echo_create_md_object(). 2. Or add some flags in spec and hint, so let lod_declare_object_create() skip the this check for echo client. Either way is fine to me.
            yujian Jian Yu added a comment -

            Hi Di,

            Here is the report with debug=-1 and debug_mb=150:
            https://testing.hpdd.intel.com/test_sets/9bb65320-011a-11e6-9ccf-5254006e85c2

            yujian Jian Yu added a comment - Hi Di, Here is the report with debug=-1 and debug_mb=150: https://testing.hpdd.intel.com/test_sets/9bb65320-011a-11e6-9ccf-5254006e85c2
            di.wang Di Wang added a comment -

            Could you please re-run with debug-level = -1? thanks.

            di.wang Di Wang added a comment - Could you please re-run with debug-level = -1? thanks.

            People

              yong.fan nasf (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: