[LU-10384] Replace nids doesn't add failover nid and add_conn string to config Created: 14/Dec/17 Updated: 19/Apr/19 Resolved: 04/Jan/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | Artem Blagodarenko (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
There is MDT-MDT connection problem after failover. Here is config for MDT0000 (no replace_nids applied for it) #10 (224)marker 17 (flags=0x01, v2.5.1.0) lustre-MDT0000 ‘add osp’ Wed Aug 30 10:34:36 2017- #11 (088)add_uuid nid=192.168.0.112@tcp(0x20000c0a80070) 0: 1:192.168.0.112@tcp #12 (144)attach 0:lustre-MDT0000-osp-MDT0001 1:osp 2:lustre-MDT0001-mdtlov_UUID #13 (152)setup 0:lustre-MDT0000-osp-MDT0001 1:lustre-MDT0000_UUID 2:192.168.0.112@tcp #14 (088)add_uuid nid=192.168.0.113@tcp(0x20000c0a80071) 0: 1:192.168.0.113@tcp #15 (120)add_conn 0:lustre-MDT0000-osp-MDT0001 1:192.168.0.113@tcp #16 (136)modify_mdc_tgts add 0:lustre-MDT0001-mdtlov 1:lustre-MDT0000_UUID 2:0 3:1 #17 (224)END marker 17 (flags And MDT0001 config after replace_nids. #19 (224)marker 20 (flags=0x01, v2.5.1.0) lustre-MDT0001 ‘add osp’ Wed Aug 30 10:34:36 2017- #20 (088)add_uuid nid=192.168.0.113@tcp(0x20000c0a80071) 0: 1:192.168.0.113@tcp #21 (144)attach 0:lustre-MDT0001-osp-MDT0000 1:osp 2:lustre-MDT0000-mdtlov_UUID #22 (152)setup 0:lustre-MDT0001-osp-MDT0000 1:lustre-MDT0001_UUID 2:192.168.0.113@tcp #23 (136)modify_mdc_tgts add 0:lustre-MDT0000-mdtlov 1:lustre-MDT0001_UUID 2:1 3:1 #24 (224)END marker 20 (flags=0x02, v2.5.1.0) lustre-MDT0001 ‘add osp’ Wed Aug 30 10:34:36 2017- Replace nids doesn't add failover nid and add_conn string to config. This is the reason ops connection can not be established after failover. The solution is add option to replace_nids that adds failover record. |
| Comments |
| Comment by Gerrit Updater [ 21/Dec/17 ] |
|
Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/30624 |
| Comment by Artem Blagodarenko (Inactive) [ 12/Feb/18 ] |
|
Hello James_Nunez [16136.542599] LustreError: 10477:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.2.8.156@tcp [16136.543707] Lustre: 10477:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.2.8.156@tcp: error processing recovery log lustre-mdtir: rc = -2 [16136.545012] LustreError: 10477:0:(mgc_request.c:2132:mgc_process_log()) MGC10.2.8.156@tcp: recover log lustre-mdtir failed, not fatal: rc = -2 10.2.8.156 - is new address that was applied by lctl replace_nids [16118.896695] Lustre: DEBUG MARKER: lctl replace_nids lustre-MDT0000 10.2.8.156@tcp [16119.429006] Lustre: DEBUG MARKER: lctl replace_nids lustre-MDT0001 10.2.8.156@tcp [16119.748693] Lustre: DEBUG MARKER: lctl replace_nids lustre-OST0000 10.2.8.156@tcp [16120.067065] Lustre: DEBUG MARKER: lctl replace_nids lustre-OST0001 10.2.8.156@tcp I can not be sure now if my patch have no influence to 108a and 108b tests falls. I am going to investigate the tests hangs in this issue. Because I have no ready zfs-based installation here, and testing system can easily reproduce this issue, can I ask to support me sharing some extra finales? Thanks, |
| Comment by Artem Blagodarenko (Inactive) [ 16/Apr/18 ] |
|
jamesanunez Maloo set -1 to my patch. I checked locally config_sanity test_32d is failed with "rmmod: ERROR: Module zfs is in use" with and without my patch. Test test_75 is always passed in my local box (with/without my patch). Can you verify patch? Thanks. |
| Comment by Artem Blagodarenko (Inactive) [ 16/Apr/18 ] |
|
Same result for test_32a. Failed with/without my patch. |
| Comment by Gerrit Updater [ 04/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/30624/ |
| Comment by Peter Jones [ 04/Jan/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 25/Feb/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34296 |
| Comment by Gerrit Updater [ 19/Mar/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34296/ |