Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9799

mount doesn't return an error when failing

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.10.1, Lustre 2.11.0
    • Lustre 2.10.0
    • None
    • Lustre: Build Version: 2.10.0_5_gbb3c407
    • 3
    • 9223372036854775807

    Description

      When mount -t lustre ... has failed to actually mount a target, the exit code of mount does not reflect this:

      # mount -t lustre zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13/MGS /mnt/MGS
      e2label: No such file or directory while trying to open zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13/MGS
      Couldn't find valid filesystem superblock.
      # echo $?
      0
      

      This of course wreaks havoc on systems such as IML which rely on the exit code of one step in the process of starting a filesystem to decide if it should continue with subsequent steps.

      Attachments

        Issue Links

          Activity

            [LU-9799] mount doesn't return an error when failing

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28581/
            Subject: LU-9799 mount: Call read_ldd with initialized mount type
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: f869b9da902fc305bfab8e902d0c1202aec6a7bc

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28581/ Subject: LU-9799 mount: Call read_ldd with initialized mount type Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: f869b9da902fc305bfab8e902d0c1202aec6a7bc

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28581
            Subject: LU-9799 mount: Call read_ldd with initialized mount type
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 776cae93b762e819cf80eed04d57bfc4040f09f0

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28581 Subject: LU-9799 mount: Call read_ldd with initialized mount type Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 776cae93b762e819cf80eed04d57bfc4040f09f0
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28456/
            Subject: LU-9799 mount: Call read_ldd with initialized mount type
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0108281c65545df169faaa0ce0690fb021680643

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28456/ Subject: LU-9799 mount: Call read_ldd with initialized mount type Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0108281c65545df169faaa0ce0690fb021680643

            Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/28456
            Subject: LU-9799 mount: Call read_ldd with initialized mount type
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: aeffd031d67a99896657617f7acd6dafd7a7722c

            gerrit Gerrit Updater added a comment - Nathaniel Clark (nathaniel.l.clark@intel.com) uploaded a new patch: https://review.whamcloud.com/28456 Subject: LU-9799 mount: Call read_ldd with initialized mount type Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: aeffd031d67a99896657617f7acd6dafd7a7722c

            Okay, I was testing something totally different and ran into this:

            [root@ieel-mds03 ~]# mount -t lustre MGS/MGT /mnt/MGT
            e2label: No such file or directory while trying to open MGS/MGT
            Couldn't find valid filesystem superblock.
            

            But on closer inspection:

            [root@ieel-mds03 ~]# df
            Filesystem                      1K-blocks     Used Available Use% Mounted on
            /dev/mapper/cl_ieel--mds03-root   6486016  1867296   4618720  29% /
            devtmpfs                           496568        0    496568   0% /dev
            tmpfs                              508324    39216    469108   8% /dev/shm
            tmpfs                              508324    13188    495136   3% /run
            tmpfs                              508324        0    508324   0% /sys/fs/cgroup
            /dev/sda1                         1038336   193444    844892  19% /boot
            ieel-storage:/home               40572928 38486912   2086016  95% /home
            tmpfs                              101668        0    101668   0% /run/user/0
            MGS                               5047168        0   5047168   0% /MGS
            MGS/MGT                           5007744        0   5005696   0% /mnt/MGT
            

            The filesystem did mount, and each umount and remount, I get the same error message, but it succeeds in mounting.

            utopiabound Nathaniel Clark added a comment - Okay, I was testing something totally different and ran into this: [root@ieel-mds03 ~]# mount -t lustre MGS/MGT /mnt/MGT e2label: No such file or directory while trying to open MGS/MGT Couldn't find valid filesystem superblock. But on closer inspection: [root@ieel-mds03 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/cl_ieel--mds03-root 6486016 1867296 4618720 29% / devtmpfs 496568 0 496568 0% /dev tmpfs 508324 39216 469108 8% /dev/shm tmpfs 508324 13188 495136 3% /run tmpfs 508324 0 508324 0% /sys/fs/cgroup /dev/sda1 1038336 193444 844892 19% /boot ieel-storage:/home 40572928 38486912 2086016 95% /home tmpfs 101668 0 101668 0% /run/user/0 MGS 5047168 0 5047168 0% /MGS MGS/MGT 5007744 0 5005696 0% /mnt/MGT The filesystem did mount, and each umount and remount, I get the same error message, but it succeeds in mounting.

            Right. But something similar must be happening, though I am at a loss as to what.

            utopiabound Nathaniel Clark added a comment - Right. But something similar must be happening, though I am at a loss as to what.

            Ahhh.  So, relevant only to force that codepath.  But clearly that is not the problem in my case.

            brian Brian Murrell (Inactive) added a comment - Ahhh.  So, relevant only to force that codepath.  But clearly that is not the problem in my case.

            The name of the pool is absolutely key for my reproduction:

            lustre/utils/libmount_utils_ldiskfs.c:

             448 /* Check whether the file exists in the device */
             449 static int file_in_dev(char *file_name, char *dev_name)
             450 {
             451         FILE *fp;
             452         char debugfs_cmd[256];
             453         unsigned int inode_num;
             454         int i;
             455 
             456         /* Construct debugfs command line. */
             457         snprintf(debugfs_cmd, sizeof(debugfs_cmd),
             458                  "%s -c -R 'stat %s' '%s' 2>&1 | egrep '(Inode|unsupported)'",
             459                  DEBUGFS, file_name, dev_name);
             460 
            

            Notice the ...|egrep ... on line 458. That will report out report output text if the egrep matches the pool name on an error, since stderr is also redirected through the egrep.

            utopiabound Nathaniel Clark added a comment - The name of the pool is absolutely key for my reproduction: lustre/utils/libmount_utils_ldiskfs.c: 448 /* Check whether the file exists in the device */ 449 static int file_in_dev(char *file_name, char *dev_name) 450 { 451 FILE *fp; 452 char debugfs_cmd[256]; 453 unsigned int inode_num; 454 int i; 455 456 /* Construct debugfs command line. */ 457 snprintf(debugfs_cmd, sizeof(debugfs_cmd), 458 "%s -c -R 'stat %s' '%s' 2>&1 | egrep '(Inode|unsupported)'", 459 DEBUGFS, file_name, dev_name); 460 Notice the ...|egrep ... on line 458. That will report out report output text if the egrep matches the pool name on an error, since stderr is also redirected through the egrep.

            utopiabound:

            But the ZFS, OS, e2fsprogs versions aren't.

            It's probably moot, but just for clarity, the ZFS version is whatever is built by Jenkins with b2_10. e2fsprogs is most recent GA and O/S is RHEL 7.4. I doubt any of these are particularly relevant though.

            I can get sort of close, but not with your pool name:

            I think you got much more than just "sort of close".  I think you got an exact reproduction. The names of pools, etc. I think is quite irrelevant.

            The question is still though, when the e2label call is only in the ldiskfs OSD codepath, in ldiskfs_read_ldd(), why is that being hit for a ZFS formatted target?

            My reading of the code is that by the time osd_read_ldd() is supposed to call either zfs_read_ldd() or ldiskfs_read_ldd(), the format of the target is known and stored in ldd->ldd_mount_type, so only the relevant one of either zfs_read_ldd() or ldiskfs_read_ldd() should be called, not both and so why are we getting an error from the e2label that is only in ldiskfs_read_ldd()?

            brian Brian Murrell (Inactive) added a comment - utopiabound : But the ZFS, OS, e2fsprogs versions aren't. It's probably moot, but just for clarity, the ZFS version is whatever is built by Jenkins with b2_10. e2fsprogs is most recent GA and O/S is RHEL 7.4. I doubt any of these are particularly relevant though. I can get sort of close, but not with your pool name: I think you got much more than just "sort of close".  I think you got an exact reproduction. The names of pools, etc. I think is quite irrelevant. The question is still though, when the e2label call is only in the ldiskfs OSD codepath, in ldiskfs_read_ldd() , why is that being hit for a ZFS formatted target? My reading of the code is that by the time  osd_read_ldd()  is supposed to call either zfs_read_ldd() or ldiskfs_read_ldd() , the format of the target is known and stored in  ldd->ldd_mount_type , so only the relevant one of either zfs_read_ldd() or ldiskfs_read_ldd() should be called, not both and so why are we getting an error from the e2label that is only in ldiskfs_read_ldd() ?

            People

              utopiabound Nathaniel Clark
              brian Brian Murrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: