Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7481

Failover: recovery-mds-scale test_failover_mds: /dev/lvm-Role_MDS/P1 failed to initialize!

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.8.0
    • None
    • EL7 Server/SLES11 SP3 Client, Build# 3251
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c010ef0a-906d-11e5-a833-5254006e85c2.

      The sub-test test_failover_mds failed with the following error:

      /dev/lvm-Role_MDS/P1 failed to initialize!
      

      Test log:

      Update not seen after 90s: wanted '' got 'lustre:MDT0000'
       recovery-mds-scale test_failover_mds: @@@@@@ FAIL: /dev/lvm-Role_MDS/P1 failed to initialize! 
      

      Looks similar to TEI-3682

      Attachments

        Issue Links

          Activity

            [LU-7481] Failover: recovery-mds-scale test_failover_mds: /dev/lvm-Role_MDS/P1 failed to initialize!
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24845/
            Subject: LU-7481 utils: label lustre device correctly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a6e430e81669a6ab40ecae9b416dd2cdee45908c

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24845/ Subject: LU-7481 utils: label lustre device correctly Project: fs/lustre-release Branch: master Current Patch Set: Commit: a6e430e81669a6ab40ecae9b416dd2cdee45908c

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/24845
            Subject: LU-7481 utils: label lustre device correctly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1264e511ddf2229a3feaeda52323720176254be7

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/24845 Subject: LU-7481 utils: label lustre device correctly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1264e511ddf2229a3feaeda52323720176254be7

            This should be similar problem, but not the same problem described in LU-7428, which was caused in "osd_ro". recovery-mds-scale has no such operation,
            it only failover the MDS(or OST), which could cause such problem.

            I have investigated the code lines of "e2fsprogs" and found it only read the buffer of the super_block of the device and write it back after some modification.
            if there are no super_block related operations in the corresponding EXT4 filesystem, the buffer won't be written back to disk along with the journal, which can
            answer why the freeze operation added in the patch http://review.whamcloud.com/20586/ in LU-7428 can fix the problem (ext4_freeze commits super_block).

            I created a tentative patch at http://review.whamcloud.com/23935

            hongchao.zhang Hongchao Zhang added a comment - This should be similar problem, but not the same problem described in LU-7428 , which was caused in "osd_ro". recovery-mds-scale has no such operation, it only failover the MDS(or OST), which could cause such problem. I have investigated the code lines of "e2fsprogs" and found it only read the buffer of the super_block of the device and write it back after some modification. if there are no super_block related operations in the corresponding EXT4 filesystem, the buffer won't be written back to disk along with the journal, which can answer why the freeze operation added in the patch http://review.whamcloud.com/20586/ in LU-7428 can fix the problem (ext4_freeze commits super_block). I created a tentative patch at http://review.whamcloud.com/23935

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/23935
            Subject: LU-7481 utils: commit device label
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 27d82e7f30b105781565915a289545e222a7c5cb

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/23935 Subject: LU-7481 utils: commit device label Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 27d82e7f30b105781565915a289545e222a7c5cb
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Still hitting on master (2.8.57 commit 9f36c9a2fc6f8cfc99):
            https://testing.hpdd.intel.com/test_sessions/a3cd7814-72ac-11e6-8afd-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Still hitting on master (2.8.57 commit 9f36c9a2fc6f8cfc99): https://testing.hpdd.intel.com/test_sessions/a3cd7814-72ac-11e6-8afd-5254006e85c2
            sarah Sarah Liu added a comment -

            Hi Saurabh,

            the patch for LU-7428 was landed on 13/Jul/16, while you reopened this ticket on 04/May/16, I think when you reopened the ticket, the patch didn't land at that time. close for now as a dup of LU-7428

            sarah Sarah Liu added a comment - Hi Saurabh, the patch for LU-7428 was landed on 13/Jul/16, while you reopened this ticket on 04/May/16, I think when you reopened the ticket, the patch didn't land at that time. close for now as a dup of LU-7428

            Reopening the issue as even after LU-7428 been resolved this issue still persists.

            standan Saurabh Tandan (Inactive) added a comment - Reopening the issue as even after LU-7428 been resolved this issue still persists.

            I suspect this is a duplicate of LU-7428.

            adilger Andreas Dilger added a comment - I suspect this is a duplicate of LU-7428 .

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: