Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18436

need simple process to rebuild CONFIGS/mountdata file

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.14.0, Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      Need a simple mechanism to rebuild/recreate/rewrite the CONFIGS/mountdata file. Currently, it is possible to copy the mountdata file from another MDT and binary edit it to have the right MDT index (one binary, two ASCII), but it should be possible to rebuild this file with tunefs.lustre (possibly copying it from another MDT or OST and then tunefs.lustre to reset the index?)

      Attachments

        Issue Links

          Activity

            [LU-18436] need simple process to rebuild CONFIGS/mountdata file

            "Rahul Bansal <rahulmay94@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58513
            Subject: LU-18436 tunefs: process to rebuild CONFIGS/mountdata file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: df319957388005a5f7e1aebb61284c4c6a963a73

            gerrit Gerrit Updater added a comment - "Rahul Bansal <rahulmay94@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58513 Subject: LU-18436 tunefs: process to rebuild CONFIGS/mountdata file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: df319957388005a5f7e1aebb61284c4c6a963a73

            It is probably less complex to have separate "--mountdata-dev=MDTDEV" and "--mountdata-file=FILE" options, so that the code doesn't have to be fancy about detecting whether this is an ldiskfs filesystem image or mountdata file (though it could do this by checking if it is a block device (ldiskfs), or reading the file and checking for LDD_MAGIC or EXT4_MAGIC).

            The "--mountdata-dev=MDTDEV" option would need to pass the MDTDEV device name to lustre/utils/libmount_utils_ldiskfs.c::ldiskfs_read_ldd() and then close and zero the backfs handle, so that this is not re-used in ldiskfs_write_ldd() when it is called. That would be bad, since it would potentially modify the source MDT CONFIGS/mountdata file instead of the broken one, causing even more problems...

            In the "--mountdata-file=FILE" case, the ldiskfs_read_ldd() code is useless, and it could just use:

                    fd = open(file);
                    if (fd < 0)
                            ...;
                    rc =  read(fd, mo_ldd, sizeof(mo_ldd));
                    if (rc < sizeof(*mo_ldd))
                            ...;
                    close(fd)
            

            (with appropriate error handling).

            I do notice an oddity in ldiskfs_read_ldd() that it is running the "e2label" command even if ext2fs_file_read() is successful? It isn't really clear that this needed, or only if the mo_ldd->ldd_svname field is empty after the read was successful? Similarly, I notice that ldiskfs_write_ldd() is using libext2fs to directly open the filesystem to set the MMP feature flag, but is then closing the filesystem and mounting it onto a temporary mountpoint and calling mkdir(MOUNT_CONFIGS_DIR) and open(MOUNT_DATA_FILE) and write(mo_ldd) to write the CONFIGS/mountdata file, when it could be doing this directly via libext2fs. We've definitely had issues with this extra mount/unmount cycle in the past (e.g. LU-13241, LU-7002), and it would be good to get rid of that. I filed LU-18818 to track these issues, since they are somewhat independent of this one and could be completed independently.

            adilger Andreas Dilger added a comment - It is probably less complex to have separate " --mountdata-dev= MDTDEV " and " --mountdata-file= FILE " options, so that the code doesn't have to be fancy about detecting whether this is an ldiskfs filesystem image or mountdata file (though it could do this by checking if it is a block device (ldiskfs), or reading the file and checking for LDD_MAGIC or EXT4_MAGIC ). The " --mountdata-dev= MDTDEV " option would need to pass the MDTDEV device name to lustre/utils/libmount_utils_ldiskfs.c::ldiskfs_read_ldd() and then close and zero the backfs handle, so that this is not re-used in ldiskfs_write_ldd() when it is called. That would be bad, since it would potentially modify the source MDT CONFIGS/mountdata file instead of the broken one, causing even more problems... In the " --mountdata-file= FILE " case, the ldiskfs_read_ldd() code is useless, and it could just use: fd = open(file); if (fd < 0) ...; rc = read(fd, mo_ldd, sizeof(mo_ldd)); if (rc < sizeof(*mo_ldd)) ...; close(fd) (with appropriate error handling). I do notice an oddity in ldiskfs_read_ldd() that it is running the " e2label " command even if ext2fs_file_read() is successful? It isn't really clear that this needed, or only if the mo_ldd->ldd_svname field is empty after the read was successful? Similarly, I notice that ldiskfs_write_ldd() is using libext2fs to directly open the filesystem to set the MMP feature flag, but is then closing the filesystem and mounting it onto a temporary mountpoint and calling mkdir(MOUNT_CONFIGS_DIR) and open(MOUNT_DATA_FILE) and write(mo_ldd) to write the CONFIGS/mountdata file, when it could be doing this directly via libext2fs. We've definitely had issues with this extra mount/unmount cycle in the past (e.g. LU-13241 , LU-7002 ), and it would be good to get rid of that. I filed LU-18818 to track these issues, since they are somewhat independent of this one and could be completed independently.

            There are two cases that I can think of for mountdata to be missing:

            • mount data file is completely missing for some reason (corruption caused admin or e2fsck to remove it). It needs to be rebuilt from available information (eg. ldiskfs filesystem label).
            • directory corruption has put this file into lost+found. I'm not sure if LFSCK will find it in this case and restore it?

            Something like "lctl --mountdata-reset /dev/MDTDEV" would be good to reset the UUID/index in the mountdata file, but it would still need to be copied first, which is a bit of a burden for admins under pressure. it woukd be even better if there was an option like "--mountdata-from={MDTDEV|FILE}" to copy it from a source device (or file if it is copied from a remote server).

            adilger Andreas Dilger added a comment - There are two cases that I can think of for mountdata to be missing: mount data file is completely missing for some reason (corruption caused admin or e2fsck to remove it). It needs to be rebuilt from available information (eg. ldiskfs filesystem label). directory corruption has put this file into lost+found. I'm not sure if LFSCK will find it in this case and restore it? Something like " lctl --mountdata-reset /dev/MDTDEV " would be good to reset the UUID/index in the mountdata file, but it would still need to be copied first, which is a bit of a burden for admins under pressure. it woukd be even better if there was an option like " --mountdata-from={MDTDEV|FILE }" to copy it from a source device (or file if it is copied from a remote server).
            Bansal Rahul Bansal added a comment -

            Can you please share a scenario in which we need to rewrite the CONFIGS/mountdata file?
            I want to understand what type of problems are being targeted here.
            It will also be helpful in writing test cases.

            Bansal Rahul Bansal added a comment - Can you please share a scenario in which we need to rewrite the CONFIGS/mountdata file? I want to understand what type of problems are being targeted here. It will also be helpful in writing test cases.
            Bansal Rahul Bansal added a comment -

            Once I understand the specifics around this task (I am new to Lustre), I will put forward if some other files/configurations can be restored in a similar manner.

            Bansal Rahul Bansal added a comment - Once I understand the specifics around this task (I am new to Lustre), I will put forward if some other files/configurations can be restored in a similar manner.

            This is very ldiskfs specifc. I wonder if we could do some special universal interface.

            simmonsja James A Simmons added a comment - This is very ldiskfs specifc. I wonder if we could do some special universal interface.

            People

              Bansal Rahul Bansal
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: