Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5863

Can't mount MGS due to label being less than 8 characters long.

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.7.0
    • 3
    • 16404

    Description

      The patch for LU-5383 fixed a out bound for ldd_svname. It test to see if ldd_svname is at least 8 characters in length but the function in lustre_disk.h server_make_name() sets ldd_svname for the mgs to MGS which is to short. This can prevent the MGS from mounting.

      Attachments

        Issue Links

          Activity

            [LU-5863] Can't mount MGS due to label being less than 8 characters long.

            In case it influences the decision to know what people are doing in the real world, Cray is increasingly using separate MGS and MDS volumes for ease of backup and recovery.
            Also for DNE - it's convenient to have MDS volumes all identical.

            paf Patrick Farrell (Inactive) added a comment - In case it influences the decision to know what people are doing in the real world, Cray is increasingly using separate MGS and MDS volumes for ease of backup and recovery. Also for DNE - it's convenient to have MDS volumes all identical.
            yujian Jian Yu added a comment -

            Hi Andreas,

            Clearly, we need a conf-sanity test that formats a separate MGS and mounts it, since we can't possibly have had such a test if this bug slipped through.

            I found that conf-sanity test 21d is a basic test case that starting a separate MGS. However, autotest system always uses the configuration of combined MGT and MDT, and then test 21d has been always skipped. Should we enhance autotest system to add a configuration of separated MGT and MDT, and run a test session with this configuration?
            For now and this ticket, I'll just add a new test case that formats $fs2mds_DEV as a separate MGT and mounts it.

            yujian Jian Yu added a comment - Hi Andreas, Clearly, we need a conf-sanity test that formats a separate MGS and mounts it, since we can't possibly have had such a test if this bug slipped through. I found that conf-sanity test 21d is a basic test case that starting a separate MGS. However, autotest system always uses the configuration of combined MGT and MDT, and then test 21d has been always skipped. Should we enhance autotest system to add a configuration of separated MGT and MDT, and run a test session with this configuration? For now and this ticket, I'll just add a new test case that formats $fs2mds_DEV as a separate MGT and mounts it.
            yujian Jian Yu added a comment -

            Sure. I'll do.

            yujian Jian Yu added a comment - Sure. I'll do.

            Could you finish it off. I'm working on a few other tickets.

            simmonsja James A Simmons added a comment - Could you finish it off. I'm working on a few other tickets.
            yujian Jian Yu added a comment -

            Hi James Simmons,
            Are you going to add a regression test case into conf-sanity.sh or would you like me to do this?

            yujian Jian Yu added a comment - Hi James Simmons, Are you going to add a regression test case into conf-sanity.sh or would you like me to do this?

            The patch fixes the core of the problem, but it doesn't add a test. Clearly, we need a conf-sanity test that formats a separate MGS and mounts it, since we can't possibly have had such a test if this bug slipped through.

            adilger Andreas Dilger added a comment - The patch fixes the core of the problem, but it doesn't add a test. Clearly, we need a conf-sanity test that formats a separate MGS and mounts it, since we can't possibly have had such a test if this bug slipped through.

            Landed to master (pre-2.7)

            jamesanunez James Nunez (Inactive) added a comment - Landed to master (pre-2.7)

            And i have confirmed that we do NOT test internally with a separate MGS - so this was a complete escape.

            cliffw Cliff White (Inactive) added a comment - And i have confirmed that we do NOT test internally with a separate MGS - so this was a complete escape.

            And, I can't make a new clean filesystem, either. Hyperion is down
            MGSNID="192.168.120.5@o2ib"
            MGSDEV="/dev/mapper/iws10_4"
            MDSDEV="/dev/mapper/iws10_1"
            mkfs.lustre --reformat --mgs --device-size=$((512 * 1048576)) --fsname lustre $MGSDEV
            mkfs.lustre --reformat --mdt --index=0 --mkfsoptions='-i 4096' --mgsnid=$MGSNID --device-size=$((2048 * 1048576)) --fsname lustre $MDSDEV

            mkdir /mnt/mgs
            mkdir /mnt/mds

            mount -t lustre $MGSDEV /mnt/mgs
            mount -t lustre $MDSDEV /mnt/mds
            ~
            Fails with same error

            cliffw Cliff White (Inactive) added a comment - And, I can't make a new clean filesystem, either. Hyperion is down MGSNID="192.168.120.5@o2ib" MGSDEV="/dev/mapper/iws10_4" MDSDEV="/dev/mapper/iws10_1" mkfs.lustre --reformat --mgs --device-size=$((512 * 1048576)) --fsname lustre $MGSDEV mkfs.lustre --reformat --mdt --index=0 --mkfsoptions='-i 4096' --mgsnid=$MGSNID --device-size=$((2048 * 1048576)) --fsname lustre $MDSDEV mkdir /mnt/mgs mkdir /mnt/mds mount -t lustre $MGSDEV /mnt/mgs mount -t lustre $MDSDEV /mnt/mds ~ Fails with same error

            This bug has stopped all testing on the stable filesystem, as the stable filesystem (formatted with older Lustre ) can no longer be mounted.
            arg[4] = /dev/mapper/iws11_4
            arg[5] = /mnt/mgs
            source = /dev/mapper/iws11_4 (/dev/mapper/iws11_4), target = /mnt/mgs
            options = rw
            checking for existing Lustre data: found
            Reading CONFIGS/mountdata
            mount.lustre: invalid name 'MGS'

            I really don't want to nuke the stable side, so a fix would be appreciated

            cliffw Cliff White (Inactive) added a comment - This bug has stopped all testing on the stable filesystem, as the stable filesystem (formatted with older Lustre ) can no longer be mounted. arg [4] = /dev/mapper/iws11_4 arg [5] = /mnt/mgs source = /dev/mapper/iws11_4 (/dev/mapper/iws11_4), target = /mnt/mgs options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata mount.lustre: invalid name 'MGS' I really don't want to nuke the stable side, so a fix would be appreciated

            Thanks for the patch, James.

            jamesanunez James Nunez (Inactive) added a comment - Thanks for the patch, James.

            People

              jamesanunez James Nunez (Inactive)
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: