[LU-4749] ZFS-backed OST mkfs.lustre --servicenode does not correctly add failover_nids Created: 11/Mar/14  Updated: 27/Apr/15  Resolved: 09/Oct/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2, Lustre 2.7.0
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Blocker
Reporter: Anthony Alba Assignee: Li Wei (Inactive)
Resolution: Fixed Votes: 0
Labels: prz, zfs
Environment:

CentOS 6.4, ZFS 0.6.2


Issue Links:
Related
is related to LU-4334 With ZFS can only declare a single mg... Resolved
Epic/Theme: ZFS
Severity: 3
Rank (Obsolete): 13075

 Description   

When creating ZFS-backed OSTs using the --servicenode syntax, only one failover nids is stored.

mkfs.lustre --ost --index=1 --fsname=saturn --backfstype=zfs --mgsnode=192.168.122.73@tcp --servicenode=192.168.122.76@tcp --servicenode=192.168.122.78@tcp lsrv3/saturn-ost1

  1. tunefs.lustre --print lsrv3/saturn-ost1
    checking for existing Lustre data: found

Read previous values:
Target: saturn-OST0001
Index: 1
Lustre FS: saturn
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=192.168.122.78@tcp mgsnode=192.168.122.73@tcp

Permanent disk data:
Target: saturn-OST0001
Index: 1
Lustre FS: saturn
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=192.168.122.78@tcp mgsnode=192.168.122.73@tcp

On MGS:

  1. lctl get_param osp.saturn-OST0001-osc-MDT0000.import
    osp.saturn-OST0001-osc-MDT0000.import=
    import:
    name: saturn-OST0001-osc-MDT0000
    target: saturn-OST0001_UUID
    state: FULL
    instance: 1
    connect_flags: [lov_index, unused, version, request_portal, adaptive_timeouts, lru_resize, fid_is_enabled, skip_orphan, full20, lvb_type]
    import_flags: [replayable, pingable]
    connection:
    failover_nids: [192.168.122.78@tcp]
    current_connection: 192.168.122.78@tcp

For a ldiskfs-backed OST, two NIDs are stored:

  1. mkfs.lustre --ost --index=0 --fsname=saturn --servicenode=192.168.122.76@tcp --mgsnode=192.168.122.73@tcp --reformat /dev/vdb
  1. tunefs.lustre --print /dev/vdb
    checking for existing Lustre data: found
    Reading CONFIGS/mountdata

Read previous values:
Target: saturn-OST0000
Index: 0
Lustre FS: saturn
Mount type: ldiskfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: failover.node=192.168.122.76@tcp failover.node=192.168.122.78@tcp mgsnode=192.168.122.73@tcp

Permanent disk data:
Target: saturn-OST0000
Index: 0
Lustre FS: saturn
Mount type: ldiskfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts: errors=remount-ro
Parameters: failover.node=192.168.122.76@tcp failover.node=192.168.122.78@tcp mgsnode=192.168.122.73@tcp

exiting before disk write.

  1. lctl get_param osp.saturn-OST0000-osc-MDT0000.import
    osp.saturn-OST0000-osc-MDT0000.import=
    import:
    name: saturn-OST0000-osc-MDT0000
    target: saturn-OST0000_UUID
    state: FULL
    instance: 2
    connect_flags: [lov_index, unused, version, request_portal, adaptive_timeouts, lru_resize, fid_is_enabled, skip_orphan, full20, lvb_type]
    import_flags: [replayable, pingable]
    connection:
    failover_nids: [192.168.122.76@tcp, 192.168.122.78@tcp]
    current_connection: 192.168.122.76@tcp


 Comments   
Comment by Anthony Alba [ 11/Mar/14 ]

For the ldiskfs I omitted adding the failover NID:

  1. tunefs.lustre --erase-params --servicenode=192.168.122.76@tcp --servicenode=192.168.122.78@tcp --mgsnode=192.168.122.73@tcp /dev/vdb
Comment by Anthony Alba [ 11/Mar/14 ]

1. A second oddity: I think --mgsnode=ABCD --mgsnode=XYZW also doesn't work on 2.4.2/ZFS-backed OSTs. The 2nd mgsnode overrides the first. For LDISKFS-backed OSTs it seems to work.

2. Does the syntax --mgsnode=Pri_NID:Sec_NID work for mkfs.lustre or should one be using
--mgsnode=Pri_NID --mgsnode=Sec_NID

Comment by Jodi Levi (Inactive) [ 20/Aug/14 ]

http://review.whamcloud.com/11161 will fix this issue

Comment by Isaac Huang (Inactive) [ 27/Aug/14 ]

This looks like a duplicate of LU-4334. Both tunefs.lustre and mkfs.lustre are built from the same sources: mkfs_lustre.c mount_utils.c mount_utils.h.

Comment by Li Wei (Inactive) [ 17/Sep/14 ]

I was waiting for the LU-4334 patch to fix this, but that patch turned out to insufficient. Current mgs cannot handle colons in failover.node values. I'll submit my patch soon.

Comment by Li Wei (Inactive) [ 17/Sep/14 ]

http://review.whamcloud.com/11956

Comment by Jodi Levi (Inactive) [ 24/Sep/14 ]

Patch landed to Master

Comment by Bob Glossman (Inactive) [ 06/Oct/14 ]

backport to b2_5:
http://review.whamcloud.com/12196

Comment by Oleg Drokin [ 06/Oct/14 ]

I think this patch causes failures in LU-5706, e.g.: https://testing.hpdd.intel.com/test_sessions/6c3961a2-4aa8-11e4-95b1-5254006e85c2

Once this was cherrypicked to b2_5 as a separate patch, it started to hit.
Might be it was a fluke, but reverting it made the problem disappear.
http://review.whamcloud.com/#/c/12166/ vs http://review.whamcloud.com/#/c/12183/

Comment by Oleg Drokin [ 06/Oct/14 ]

Also looking in maloo results, it's really end of Sptember where these problems started to appear, and before that all failures were in 2013.

So i think chances are high this is the culprit.

Comment by Andreas Dilger [ 07/Oct/14 ]

Reopen due to potential problems with the patch.

Comment by Andreas Dilger [ 07/Oct/14 ]

It looks like there is some garbage being written into the ZFS properties. From the test log output of https://testing.hpdd.intel.com/test_sets/a33da7e2-4a9b-11e4-adcb-5254006e85c2

   Permanent disk data:
Target:     lustre-OST0000
Index:      0
Lustre FS:  lustre
Mount type: zfs
Flags:      0x42
              (OST update )
Persistent mount opts: 
Parameters: sys.timeout=20 mgsnode=10.1.5.243@tcp failover.node=��6

Writing lustre-ost1/ost1 properties
  lustre:version=1
  lustre:flags=66
  lustre:index=0
  lustre:fsname=lustre
  lustre:svname=lustre-OST0000
  lustre:sys.timeout=20
  lustre:mgsnode=10.1.5.243@tcp
  lustre:failover.node=��6
Comment by Li Wei (Inactive) [ 07/Oct/14 ]

This is not the cause of LU-5706; please see my comments there.

Comment by Li Wei (Inactive) [ 09/Oct/14 ]

I think this should be either closed or left open for Bob's b2_5 port. Removed the link to LU-5706.

Generated at Sat Feb 10 01:45:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.