Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8311

Target does not mount with the new mgsnode parameter format in case of multirail configuration

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.5.3
    • Lustre 2.5.3.90 w/ Bull patches, including LU-5690
    • 3
    • 9223372036854775807

    Description

      We are unable to mount the targets on Lustre servers when using multirail configuration on the MGS.

      LU-4334 introduced a format change of the mgsnode value on the targets.

      Old format:
      mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1

      New format:
      mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1:192.168.101.42@tcp,192.168.102.42@tcp1

      With patch LU-5690, we are now unable to start any target with this new format. We can see this Lustre error in the console of the OSS:

      LDISKFS-fs (vdb): Unrecognized mount option "192.168.102.41@tcp1:192.168.101.42@tcp" or missing value

      The debug log reports the following message while trying to mount OST 0:

      00000020:01200004:0.0F:1466084531.621867:0:2966:0:(obd_mount.c:1339:lustre_fill_super()) VFS Op: sb ffff88001f583c00
      00000020:01000004:0.0:1466084531.621882:0:2966:0:(obd_mount.c:830:lmd_print()) mount data:
      00000020:01000004:0.0:1466084531.621883:0:2966:0:(obd_mount.c:833:lmd_print()) device: /dev/vdb
      00000020:01000004:0.0:1466084531.621884:0:2966:0:(obd_mount.c:834:lmd_print()) flags: 0
      00000020:01000004:0.0:1466084531.621884:0:2966:0:(obd_mount.c:837:lmd_print()) options: errors=remount-ro,192.168.102.41@tcp1:192.168.101.42@tcp,192.168.102.42@tcp1
      00000020:01000004:0.0:1466084531.621885:0:2966:0:(obd_mount.c:1386:lustre_fill_super()) Mounting server from /dev/vdb
      00000020:01000004:0.0:1466084531.621887:0:2966:0:(obd_mount_server.c:1627:osd_start()) Attempting to start scratch-OST0000, type=osd-ldiskfs, lsifl=200002, mountfl=0
      00000020:01000004:0.0:1466084531.621925:0:2966:0:(obd_mount.c:191:lustre_start_simple()) Starting obd scratch-OST0000-osd (typ=osd-ldiskfs)
      00000004:00020000:0.0:1466084531.623545:0:2966:0:(osd_handler.c:5613:osd_mount()) scratch-OST0000-osd: can't mount /dev/vdb: -22
      00000020:00020000:0.0:1466084531.624487:0:2966:0:(obd_config.c:572:class_setup()) setup scratch-OST0000-osd failed (-22)
      00000020:00020000:0.0:1466084531.625290:0:2966:0:(obd_mount.c:200:lustre_start_simple()) scratch-OST0000-osd setup error -22
      00000020:01000000:0.0:1466084531.626153:0:2966:0:(obd_config.c:750:class_decref()) finishing cleanup of obd scratch-OST0000-osd (scratch-OST0000-osd_UUID)
      00000020:00020000:0.0:1466084531.626156:0:2966:0:(obd_mount_server.c:1701:server_fill_super()) Unable to start osd on /dev/vdb: -22
      00000020:01000004:0.0:1466084531.627005:0:2966:0:(obd_mount.c:653:lustre_put_lsi()) put ffff88001f583c00 1
      00000020:01000004:0.0:1466084531.627007:0:2966:0:(obd_mount.c:603:lustre_free_lsi()) Freeing lsi ffff880017c67000
      00000020:00020000:0.0:1466084531.627009:0:2966:0:(obd_mount.c:1405:lustre_fill_super()) Unable to mount (-22)

      This is easily reproducible with Lustre 2.5.3.90+LU-5690.

      1. tunefs.lustre --erase-params --mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 --mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1 /dev/vdb
        checking for existing Lustre data: found
        Reading CONFIGS/mountdata

      Read previous values:
      Target: scratch-OST0000
      Index: 0
      Lustre FS: scratch
      Mount type: ldiskfs
      Flags: 0x42
      (OST update )
      Persistent mount opts: errors=remount-ro
      Parameters: mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1

      Permanent disk data:
      Target: scratch-OST0000
      Index: 0
      Lustre FS: scratch
      Mount type: ldiskfs
      Flags: 0x42
      (OST update )
      Persistent mount opts: errors=remount-ro
      Parameters: mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1:192.168.101.42@tcp,192.168.102.42@tcp1

      Writing CONFIGS/mountdata

      1. mount -t lustre /dev/vdb /mnt/fs/scratch/ost0
        mount.lustre: set /sys/block/vdb/queue/max_sectors_kb to 2147483647

      mount.lustre: mount /dev/vdb at /mnt/fs/scratch/ost0 failed: Invalid argument
      This may have multiple causes.
      Are the mount options correct?
      Check the syslog for more info.

      Attachments

        Issue Links

          Activity

            [LU-8311] Target does not mount with the new mgsnode parameter format in case of multirail configuration

            The mount.lustre and mkfs.lustre man pages need to be updated to include better examples of how NIDs can be specified.

            • mount.lustre explains mgsspec, mgsnode, and mgsnid but does not provide an example of an MGS with a failover node and multiple NIDs.
            • mkfs.lustre does not explain nid/NID and also does not include anything but trivial examples.
            adilger Andreas Dilger added a comment - The mount.lustre and mkfs.lustre man pages need to be updated to include better examples of how NIDs can be specified. mount.lustre explains mgsspec , mgsnode , and mgsnid but does not provide an example of an MGS with a failover node and multiple NIDs. mkfs.lustre does not explain nid / NID and also does not include anything but trivial examples.
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21329/
            Subject: LU-8311 mount: fix lmd_parse() to parse colon as NID delimiter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2458067d8d55173ad68caac8c0460d46bf8106a1

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21329/ Subject: LU-8311 mount: fix lmd_parse() to parse colon as NID delimiter Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2458067d8d55173ad68caac8c0460d46bf8106a1

            Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/21329
            Subject: LU-8311 mount: fix lmd_parse() to parse colon as NID delimiter
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e74f6c0414f6ef538877e9fbca8540c4be3f2699

            gerrit Gerrit Updater added a comment - Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/21329 Subject: LU-8311 mount: fix lmd_parse() to parse colon as NID delimiter Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e74f6c0414f6ef538877e9fbca8540c4be3f2699

            For the record, an easy workaround is:

            # tunefs.lustre --erase-params --param mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 --param mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1 /dev/vdb

            This way, the old format is used in CONFIGS/mountdata.

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - - edited For the record, an easy workaround is: # tunefs.lustre --erase-params --param mgsnode=192.168.101.41@tcp,192.168.102.41@tcp1 --param mgsnode=192.168.101.42@tcp,192.168.102.42@tcp1 /dev/vdb This way, the old format is used in CONFIGS/mountdata.

            Hi Jian,

            Can you please look into this issue?

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - Hi Jian, Can you please look into this issue? Thanks. Joe

            People

              yujian Jian Yu
              bruno.travouillon Bruno Travouillon (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: