Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6291

conf-sanity test_41a: failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • 3
    • 17624

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/79052dac-bd31-11e4-8d85-5254006e85c2.

      The sub-test test_41a failed with the following error:

      test failed to respond and timed out
      

      Please provide additional information about the failure here.

      Info required for matching: conf-sanity 41a

      Attachments

        Issue Links

          Activity

            [LU-6291] conf-sanity test_41a: failed to respond and timed out

            Yes, that is the rule, if the failover MDT can not connect to all of other MDTs, it has to wait until the administrator abort the recovery. I will fix the test script for now.

            di.wang Di Wang (Inactive) added a comment - Yes, that is the rule, if the failover MDT can not connect to all of other MDTs, it has to wait until the administrator abort the recovery. I will fix the test script for now.

            The -EAGAIN error is introduced by:

            commit daa691c4dcbc5a70acf4cf161c581b45c104c87a
            Author: Wang Di <di.wang@intel.com>
            Date:   Mon Aug 11 13:46:38 2014 -0700
            
                LU-3536 lod: write updates to update log
            
                For cross-MDT operation, LOD will write updates into the
                update log on all of MDTs.
            
                1. In transaction start, LOD perpare the update records
                   buffer for cross-MDT operation.
                2. Sub LOD collects all updates in execution phase.
                3. In transaction stop, LOD will write thse updates as
                   llog record on all of MDTs.
            
                Change-Id: Ibba79267393db00ba05e0aa2df9865f88149eaa4
                Signed-off-by: Wang Di <di.wang@intel.com>
            

            Which means client won't be able to mount until all MDTs started, I'm not sure if it's by design or a defect. If it's a defect, we'd fix it rather than fix the test script.

            niu Niu Yawei (Inactive) added a comment - The -EAGAIN error is introduced by: commit daa691c4dcbc5a70acf4cf161c581b45c104c87a Author: Wang Di <di.wang@intel.com> Date: Mon Aug 11 13:46:38 2014 -0700 LU-3536 lod: write updates to update log For cross-MDT operation, LOD will write updates into the update log on all of MDTs. 1. In transaction start, LOD perpare the update records buffer for cross-MDT operation. 2. Sub LOD collects all updates in execution phase. 3. In transaction stop, LOD will write thse updates as llog record on all of MDTs. Change-Id: Ibba79267393db00ba05e0aa2df9865f88149eaa4 Signed-off-by: Wang Di <di.wang@intel.com> Which means client won't be able to mount until all MDTs started, I'm not sure if it's by design or a defect. If it's a defect, we'd fix it rather than fix the test script.

            That's a test script problem, in the DNE environment, this test (conf-sanity 41a) starts only the master MDT, so client mount will always fail for -EAGAIN. See tgt_handle_connect() -> mdt_obd_connect() -> lod_obd_get_info().

            I think we'd change the test script to replace the "start $SINGLEMDS" to "start_mds". Some other tests have the same problem, 41b for instance.

            niu Niu Yawei (Inactive) added a comment - That's a test script problem, in the DNE environment, this test (conf-sanity 41a) starts only the master MDT, so client mount will always fail for -EAGAIN. See tgt_handle_connect() -> mdt_obd_connect() -> lod_obd_get_info(). I think we'd change the test script to replace the "start $SINGLEMDS" to "start_mds". Some other tests have the same problem, 41b for instance.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: