[LU-5950] Clients fail to connect following OSS failover in file system with 8 MDSs Created: 24/Nov/14 Updated: 05/Dec/14 Resolved: 05/Dec/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.5.2 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Boyko | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
options lnet ip2nets="o2ib8(ib0) 10.150.10.[1-12]; o2ib8000(ib0) 10.150.10.[1-8]; o2ib8002(ib0:1) 10.151.10.[9-12];" |
||
| Severity: | 3 |
| Rank (Obsolete): | 16610 |
| Description |
|
the OSS host has both 10.150 and 10.151 IP addresses. The 10.150 address is actually assigned to the IB NIC while the 10.151 address is a virtual interface. oss001 kernel: LNet: Added LNI 10.150.10.9@o2ib8 [126/4032/0/0] oss001 kernel: LNet: Added LNI 10.151.10.9@o2ib8002 [126/4032/0/0] The client and mds nodes fail to complete recovery. 2014-10-29T17:08:36.465005-05:00 c0-0c1s13n3 LustreError: 6909:0:(mgc_request.c:1488:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.150.10.10@o2ib8 2014-10-29T17:08:36.465055-05:00 c0-0c1s13n3 Lustre: 6909:0:(mgc_request.c:1649:mgc_process_recover_log()) Process recover log esfprod-cliir error -2 The main problem is different network address for a nodes and missed functional at the Lustre process config. |
| Comments |
| Comment by Gerrit Updater [ 24/Nov/14 ] |
|
Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/12829 |
| Comment by Alexander Boyko [ 24/Nov/14 ] |
|
fix http://review.whamcloud.com/12829 |
| Comment by Gerrit Updater [ 04/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12829/ |
| Comment by Andreas Dilger [ 05/Dec/14 ] |
|
Patch landed for 2.7.0. |