[LU-8311] Target does not mount with the new mgsnode parameter format in case of multirail configuration Created: 21/Jun/16 Updated: 14/Jun/18 Resolved: 29/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Bruno Travouillon (Inactive) | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | p4b | ||
| Environment: |
Lustre 2.5.3.90 w/ Bull patches, including |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
We are unable to mount the targets on Lustre servers when using multirail configuration on the MGS.
Old format: New format: With patch LDISKFS-fs (vdb): Unrecognized mount option "192.168.102.41@tcp1:192.168.101.42@tcp" or missing value The debug log reports the following message while trying to mount OST 0:
This is easily reproducible with Lustre 2.5.3.90+
|
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 21/Jun/16 ] |
|
Hi Jian, Can you please look into this issue? Thanks. |
| Comment by Bruno Travouillon (Inactive) [ 30/Jun/16 ] |
|
For the record, an easy workaround is: # tunefs.lustre --erase-params --param mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 --param mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1 /dev/vdb This way, the old format is used in CONFIGS/mountdata. |
| Comment by Gerrit Updater [ 15/Jul/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/21329 |
| Comment by Gerrit Updater [ 06/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21329/ |
| Comment by Peter Jones [ 06/Aug/16 ] |
|
Landed for 2.9 |
| Comment by Andreas Dilger [ 03/Oct/16 ] |
|
The mount.lustre and mkfs.lustre man pages need to be updated to include better examples of how NIDs can be specified.
|
| Comment by Gerrit Updater [ 25/Oct/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/23355 |
| Comment by Gerrit Updater [ 28/Oct/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23355/ |
| Comment by Peter Jones [ 29/Oct/16 ] |
|
Landed for 2.9 |
| Comment by Darby Vicker [ 19/Jan/17 ] |
|
We are running into something very similar to this - not sure if its related or something different. Lots of detail in a thread on the mailing list - here is a link to one of the latest posts. http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014154.html The summary of our situation is that our LFS was formatted originally using 2.8 but we have since upgraded to 2.9.51 We are using a JBOB with server pairs for failover and are using ZFS as the backend. All servers are dual-homed on both ethernet and IB. MDT and OST failover works fine. MGS failover doesn't work if we have both ethernet and IB but does if only have ethernet NID's. We have build our own lustre server RPM's using a "git checkout 2.9.51" and zfs 0.6.5.8-1. I've verified that commit 2458067d8d55173ad68caac8c0460d46bf8106a1 is in the git log. Any help would be much appreciated. |
| Comment by Andreas Dilger [ 20/Jan/17 ] |
|
It is important to note that the 2.9.51 code is a development tag and as such has no expectation of being tested or supported. Development tags may contain protocol changes and experimental code, so unless you are using this only for testing the stability of the development branch, I would suggest to go back to 2.9.0. |
| Comment by Darby Vicker [ 20/Jan/17 ] |
|
Good to know - will do. |
| Comment by Darby Vicker [ 23/Jan/17 ] |
|
I just uploaded a couple of debug logs. Both were taken with while mounting an OST on one of our OSS's. One was while we were configured only for ethernet. tunefs.lustre \
--verbose \
--writeconf \
--erase-param \
--mgsnode=192.52.98.30@tcp0 \
--mgsnode=192.52.98.31@tcp0 \
--servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0 \
--servicenode=${LUSTRE_PEER_TCP_IP}@tcp0 \
$pool/ost-fsl
The other was while configured with both IB and ethernet. tunefs.lustre \
--verbose \
--writeconf \
--erase-param \
--mgsnode=192.52.98.30@tcp0,10.148.0.30@o2ib0 \
--mgsnode=192.52.98.31@tcp0,10.148.0.31@o2ib0 \
--servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0,${LUSTRE_LOCAL_IB_IP}@o2ib0 \
--servicenode=${LUSTRE_PEER_TCP_IP}@tcp0,${LUSTRE_PEER_IB_IP}@o2ib0 \
$pool/ost-fsl
|
| Comment by Jian Yu [ 25/Jan/17 ] |
|
Hi Darby, From the debug log in debug.log.ib_and_eth, I didn't see any MGS NID was not parsed by lmd_parse(). So, it's not the same issue as this one. |
| Comment by Darby Vicker [ 25/Jan/17 ] |
|
Thanks a lot for the info. If you need any more data from me, I'd be glad to post that - either to this LU or one those others. I'd like to try reverting that patch from the 2.9 release and see if it fixes our issue. Please let me know if you think that's worthwhile and, if so, which LU I should post the info to. |
| Comment by Jian Yu [ 26/Jan/17 ] |
|
Thank you Darby. The issue will be resolved in |