Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3445

Specifying multiple networks in NIDs does no longer work

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • 3
    • 8593

    Description

      With older version of Lustre I was used to have IB as default network, but also to have a ethernet connection as fallback, so each NID specified for failover.node or mgs.node contained both networks, e.g.:

      failover.node=10.3.0.228@o2ib,192.168.50.128@tcp

      When I try this with 2.4.0 it appears the parser does not understand this syntax anymore. I get an error in syslog:

      LDISKFS-fs (dm-7): Unrecognized mount option "192.168.50.128@tcp" or missing value

      Attachments

        Issue Links

          Activity

            [LU-3445] Specifying multiple networks in NIDs does no longer work

            Thanks Jian. For my own tracking I've just created the ticket: LU-4460.

            adegremont Aurelien Degremont (Inactive) added a comment - Thanks Jian. For my own tracking I've just created the ticket: LU-4460 .
            yujian Jian Yu added a comment -

            Thanks Aurelien for pointing this out. I'll look into these issues and figure out whether I should fix the original issue in lmd_parse(). I'll create a new Jira ticket to track the work.

            yujian Jian Yu added a comment - Thanks Aurelien for pointing this out. I'll look into these issues and figure out whether I should fix the original issue in lmd_parse(). I'll create a new Jira ticket to track the work.

            It seems to me this bug is still there for 2 reasons:

            -the patch only take care of mkfs/tunefs and so there is still an upgrade issue if mountdata contains something like failover.node=10.3.0.228@o2ib,192.168.50.128@tcp
            The way to workaround this looks to be a writeconf, which has side effect. Is there some UPGRADE notes somewhere relative to this?

            -the patch seems to modify this kind of string

            --failnode=10.3.0.228@o2ib,192.168.50.128@tcp
            

            into

            --failnode=10.3.0.228@o2ib --failnode=10.3.0.228@o2ib
            

            Which is not the same. The first example refers to 1 failnode, with 2 NIDS to reach it. The second one refers to 2 different failnodes with 1 NID each.

            adegremont Aurelien Degremont (Inactive) added a comment - It seems to me this bug is still there for 2 reasons: -the patch only take care of mkfs/tunefs and so there is still an upgrade issue if mountdata contains something like failover.node=10.3.0.228@o2ib,192.168.50.128@tcp The way to workaround this looks to be a writeconf, which has side effect. Is there some UPGRADE notes somewhere relative to this? -the patch seems to modify this kind of string --failnode=10.3.0.228@o2ib,192.168.50.128@tcp into --failnode=10.3.0.228@o2ib --failnode=10.3.0.228@o2ib Which is not the same. The first example refers to 1 failnode, with 2 NIDS to reach it. The second one refers to 2 different failnodes with 1 NID each.
            pjones Peter Jones added a comment -

            Landed for 2.4.1 and 2.5

            pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5
            bogl Bob Glossman (Inactive) added a comment - back port to b2_4: http://review.whamcloud.com/7344
            green Oleg Drokin added a comment -

            this patch cannot be cherrypicked to b2_4 due to a conflict

            green Oleg Drokin added a comment - this patch cannot be cherrypicked to b2_4 due to a conflict
            yujian Jian Yu added a comment -

            Hi Oleg,

            Could you please cherry-pick the patch to Lustre b2_4 branch? Thanks.

            yujian Jian Yu added a comment - Hi Oleg, Could you please cherry-pick the patch to Lustre b2_4 branch? Thanks.
            yujian Jian Yu added a comment -

            Patch for Lustre master branch: http://review.whamcloud.com/6686

            yujian Jian Yu added a comment - Patch for Lustre master branch: http://review.whamcloud.com/6686
            yujian Jian Yu added a comment -

            A simple test showed that:

            # tunefs.lustre --dryrun /dev/vda5
            checking for existing Lustre data: found
            Reading CONFIGS/mountdata

            Read previous values:
            Target: lustre-OST0000
            Index: 0
            Lustre FS: lustre
            Mount type: ldiskfs
            Flags: 0x62
            (OST first_time update )
            Persistent mount opts: errors=remount-ro
            Parameters: sys.timeout=20 mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp

            Permanent disk data:
            Target: lustre:OST0000
            Index: 0
            Lustre FS: lustre
            Mount type: ldiskfs
            Flags: 0x62
            (OST first_time update )
            Persistent mount opts: errors=remount-ro
            Parameters: sys.timeout=20 mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp

            exiting before disk write.

            # mount -v -t lustre /dev/vda5 /mnt/ost1
            arg[0] = /sbin/mount.lustre
            arg[1] = -v
            arg[2] = -o
            arg[3] = rw
            arg[4] = /dev/vda5
            arg[5] = /mnt/ost1
            source = /dev/vda5 (/dev/vda5), target = /mnt/ost1
            options = rw
            checking for existing Lustre data: found
            Reading CONFIGS/mountdata
            Writing CONFIGS/mountdata
            mounting device /dev/vda5 at /mnt/ost1, flags=0x1000000 options=osd=osd-ldiskfs,errors=remount-ro,mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp,virgin,update,param=sys.timeout=20,param=mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp,svname=lustre-OST0000,device=/dev/vda5
            mount.lustre: mount /dev/vda5 at /mnt/ost1 failed: Invalid argument retries left: 0
            mount.lustre: mount /dev/vda5 at /mnt/ost1 failed: Invalid argument
            This may have multiple causes.
            Are the mount options correct?
            Check the syslog for more info.

            I'm creating a patch to fix add_param() to add "key" before each "sub-val" separated by comma in "val".

            yujian Jian Yu added a comment - A simple test showed that: # tunefs.lustre --dryrun /dev/vda5 checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: lustre-OST0000 Index: 0 Lustre FS: lustre Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro Parameters: sys.timeout=20 mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp Permanent disk data: Target: lustre:OST0000 Index: 0 Lustre FS: lustre Mount type: ldiskfs Flags: 0x62 (OST first_time update ) Persistent mount opts: errors=remount-ro Parameters: sys.timeout=20 mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp exiting before disk write. # mount -v -t lustre /dev/vda5 /mnt/ost1 arg [0] = /sbin/mount.lustre arg [1] = -v arg [2] = -o arg [3] = rw arg [4] = /dev/vda5 arg [5] = /mnt/ost1 source = /dev/vda5 (/dev/vda5), target = /mnt/ost1 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/vda5 at /mnt/ost1, flags=0x1000000 options=osd=osd-ldiskfs,errors=remount-ro,mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp,virgin,update,param=sys.timeout=20,param=mgsnode=10.3.0.228@o2ib,192.168.50.128@tcp,svname=lustre-OST0000,device=/dev/vda5 mount.lustre: mount /dev/vda5 at /mnt/ost1 failed: Invalid argument retries left: 0 mount.lustre: mount /dev/vda5 at /mnt/ost1 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. I'm creating a patch to fix add_param() to add "key" before each "sub-val" separated by comma in "val".
            pjones Peter Jones added a comment -

            Yu, Jian

            Could you please look into this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yu, Jian Could you please look into this issue? Thanks Peter

            People

              yujian Jian Yu
              omangold Oliver Mangold
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: