[LU-7481] Failover: recovery-mds-scale test_failover_mds: /dev/lvm-Role_MDS/P1 failed to initialize! Created: 25/Nov/15  Updated: 20/Nov/17  Resolved: 24/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None
Environment:

EL7 Server/SLES11 SP3 Client, Build# 3251


Issue Links:
Duplicate
duplicates LU-8729 conf-sanity test_84: FAIL: /dev/mappe... Resolved
duplicates LU-7428 conf-sanity test_84, replay-dual 0a: ... Resolved
Related
is related to LU-9059 mount.lustre FATAL: unhandled/unloade... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c010ef0a-906d-11e5-a833-5254006e85c2.

The sub-test test_failover_mds failed with the following error:

/dev/lvm-Role_MDS/P1 failed to initialize!

Test log:

Update not seen after 90s: wanted '' got 'lustre:MDT0000'
 recovery-mds-scale test_failover_mds: @@@@@@ FAIL: /dev/lvm-Role_MDS/P1 failed to initialize! 

Looks similar to TEI-3682



 Comments   
Comment by Andreas Dilger [ 26/Nov/15 ]

I suspect this is a duplicate of LU-7428.

Comment by Saurabh Tandan (Inactive) [ 04/May/16 ]

Reopening the issue as even after LU-7428 been resolved this issue still persists.

Comment by Sarah Liu [ 10/Aug/16 ]

Hi Saurabh,

the patch for LU-7428 was landed on 13/Jul/16, while you reopened this ticket on 04/May/16, I think when you reopened the ticket, the patch didn't land at that time. close for now as a dup of LU-7428

Comment by Saurabh Tandan (Inactive) [ 07/Sep/16 ]

Still hitting on master (2.8.57 commit 9f36c9a2fc6f8cfc99):
https://testing.hpdd.intel.com/test_sessions/a3cd7814-72ac-11e6-8afd-5254006e85c2

Comment by Gerrit Updater [ 24/Nov/16 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/23935
Subject: LU-7481 utils: commit device label
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 27d82e7f30b105781565915a289545e222a7c5cb

Comment by Hongchao Zhang [ 24/Nov/16 ]

This should be similar problem, but not the same problem described in LU-7428, which was caused in "osd_ro". recovery-mds-scale has no such operation,
it only failover the MDS(or OST), which could cause such problem.

I have investigated the code lines of "e2fsprogs" and found it only read the buffer of the super_block of the device and write it back after some modification.
if there are no super_block related operations in the corresponding EXT4 filesystem, the buffer won't be written back to disk along with the journal, which can
answer why the freeze operation added in the patch http://review.whamcloud.com/20586/ in LU-7428 can fix the problem (ext4_freeze commits super_block).

I created a tentative patch at http://review.whamcloud.com/23935

Comment by Gerrit Updater [ 12/Jan/17 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/24845
Subject: LU-7481 utils: label lustre device correctly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1264e511ddf2229a3feaeda52323720176254be7

Comment by Gerrit Updater [ 24/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24845/
Subject: LU-7481 utils: label lustre device correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a6e430e81669a6ab40ecae9b416dd2cdee45908c

Comment by Peter Jones [ 24/Jan/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:09:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.