Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6291

conf-sanity test_41a: failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • 3
    • 17624

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/79052dac-bd31-11e4-8d85-5254006e85c2.

      The sub-test test_41a failed with the following error:

      test failed to respond and timed out
      

      Please provide additional information about the failure here.

      Info required for matching: conf-sanity 41a

      Attachments

        Issue Links

          Activity

            [LU-6291] conf-sanity test_41a: failed to respond and timed out
            di.wang Di Wang added a comment -

            Yes, that is the rule, if the failover MDT can not connect to all of other MDTs, it has to wait until the administrator abort the recovery. I will fix the test script for now.

            di.wang Di Wang added a comment - Yes, that is the rule, if the failover MDT can not connect to all of other MDTs, it has to wait until the administrator abort the recovery. I will fix the test script for now.

            The -EAGAIN error is introduced by:

            commit daa691c4dcbc5a70acf4cf161c581b45c104c87a
            Author: Wang Di <di.wang@intel.com>
            Date:   Mon Aug 11 13:46:38 2014 -0700
            
                LU-3536 lod: write updates to update log
            
                For cross-MDT operation, LOD will write updates into the
                update log on all of MDTs.
            
                1. In transaction start, LOD perpare the update records
                   buffer for cross-MDT operation.
                2. Sub LOD collects all updates in execution phase.
                3. In transaction stop, LOD will write thse updates as
                   llog record on all of MDTs.
            
                Change-Id: Ibba79267393db00ba05e0aa2df9865f88149eaa4
                Signed-off-by: Wang Di <di.wang@intel.com>
            

            Which means client won't be able to mount until all MDTs started, I'm not sure if it's by design or a defect. If it's a defect, we'd fix it rather than fix the test script.

            niu Niu Yawei (Inactive) added a comment - The -EAGAIN error is introduced by: commit daa691c4dcbc5a70acf4cf161c581b45c104c87a Author: Wang Di <di.wang@intel.com> Date: Mon Aug 11 13:46:38 2014 -0700 LU-3536 lod: write updates to update log For cross-MDT operation, LOD will write updates into the update log on all of MDTs. 1. In transaction start, LOD perpare the update records buffer for cross-MDT operation. 2. Sub LOD collects all updates in execution phase. 3. In transaction stop, LOD will write thse updates as llog record on all of MDTs. Change-Id: Ibba79267393db00ba05e0aa2df9865f88149eaa4 Signed-off-by: Wang Di <di.wang@intel.com> Which means client won't be able to mount until all MDTs started, I'm not sure if it's by design or a defect. If it's a defect, we'd fix it rather than fix the test script.

            That's a test script problem, in the DNE environment, this test (conf-sanity 41a) starts only the master MDT, so client mount will always fail for -EAGAIN. See tgt_handle_connect() -> mdt_obd_connect() -> lod_obd_get_info().

            I think we'd change the test script to replace the "start $SINGLEMDS" to "start_mds". Some other tests have the same problem, 41b for instance.

            niu Niu Yawei (Inactive) added a comment - That's a test script problem, in the DNE environment, this test (conf-sanity 41a) starts only the master MDT, so client mount will always fail for -EAGAIN. See tgt_handle_connect() -> mdt_obd_connect() -> lod_obd_get_info(). I think we'd change the test script to replace the "start $SINGLEMDS" to "start_mds". Some other tests have the same problem, 41b for instance.

            People

              niu Niu Yawei (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: