[LU-4749] ZFS-backed OST mkfs.lustre --servicenode does not correctly add failover_nids Created: 11/Mar/14 Updated: 27/Apr/15 Resolved: 09/Oct/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.2, Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Anthony Alba | Assignee: | Li Wei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prz, zfs | ||
| Environment: |
CentOS 6.4, ZFS 0.6.2 |
||
| Issue Links: |
|
||||||||
| Epic/Theme: | ZFS | ||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13075 | ||||||||
| Description |
|
When creating ZFS-backed OSTs using the --servicenode syntax, only one failover nids is stored. mkfs.lustre --ost --index=1 --fsname=saturn --backfstype=zfs --mgsnode=192.168.122.73@tcp --servicenode=192.168.122.76@tcp --servicenode=192.168.122.78@tcp lsrv3/saturn-ost1
Read previous values: Permanent disk data: On MGS:
For a ldiskfs-backed OST, two NIDs are stored:
Read previous values: Permanent disk data: exiting before disk write.
|
| Comments |
| Comment by Anthony Alba [ 11/Mar/14 ] |
|
For the ldiskfs I omitted adding the failover NID:
|
| Comment by Anthony Alba [ 11/Mar/14 ] |
|
1. A second oddity: I think --mgsnode=ABCD --mgsnode=XYZW also doesn't work on 2.4.2/ZFS-backed OSTs. The 2nd mgsnode overrides the first. For LDISKFS-backed OSTs it seems to work. 2. Does the syntax --mgsnode=Pri_NID:Sec_NID work for mkfs.lustre or should one be using |
| Comment by Jodi Levi (Inactive) [ 20/Aug/14 ] |
|
http://review.whamcloud.com/11161 will fix this issue |
| Comment by Isaac Huang (Inactive) [ 27/Aug/14 ] |
|
This looks like a duplicate of |
| Comment by Li Wei (Inactive) [ 17/Sep/14 ] |
|
I was waiting for the |
| Comment by Li Wei (Inactive) [ 17/Sep/14 ] |
| Comment by Jodi Levi (Inactive) [ 24/Sep/14 ] |
|
Patch landed to Master |
| Comment by Bob Glossman (Inactive) [ 06/Oct/14 ] |
|
backport to b2_5: |
| Comment by Oleg Drokin [ 06/Oct/14 ] |
|
I think this patch causes failures in Once this was cherrypicked to b2_5 as a separate patch, it started to hit. |
| Comment by Oleg Drokin [ 06/Oct/14 ] |
|
Also looking in maloo results, it's really end of Sptember where these problems started to appear, and before that all failures were in 2013. So i think chances are high this is the culprit. |
| Comment by Andreas Dilger [ 07/Oct/14 ] |
|
Reopen due to potential problems with the patch. |
| Comment by Andreas Dilger [ 07/Oct/14 ] |
|
It looks like there is some garbage being written into the ZFS properties. From the test log output of https://testing.hpdd.intel.com/test_sets/a33da7e2-4a9b-11e4-adcb-5254006e85c2 Permanent disk data:
Target: lustre-OST0000
Index: 0
Lustre FS: lustre
Mount type: zfs
Flags: 0x42
(OST update )
Persistent mount opts:
Parameters: sys.timeout=20 mgsnode=10.1.5.243@tcp failover.node=��6
Writing lustre-ost1/ost1 properties
lustre:version=1
lustre:flags=66
lustre:index=0
lustre:fsname=lustre
lustre:svname=lustre-OST0000
lustre:sys.timeout=20
lustre:mgsnode=10.1.5.243@tcp
lustre:failover.node=��6
|
| Comment by Li Wei (Inactive) [ 07/Oct/14 ] |
|
This is not the cause of |
| Comment by Li Wei (Inactive) [ 09/Oct/14 ] |
|
I think this should be either closed or left open for Bob's b2_5 port. Removed the link to |