Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7759

umount hanging in modern distros when OST is unavailable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0, Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      It looks like modern distros: SLES12, Fedora 21+ (possibly earlier) always do statfs on the mountpoint before issuing the umount call.

      In case one of the OSTs is not available at that time - the statfs will hang.

      In our tests there's a bunch of tests in conf-sanity that have this problem and it was "dealt with" in LU-5472 by just adding the -f umount option.
      But this still leaves palces like recovery-single test 89 and something in recovery-small at the very least.

      And most of all regular users would be affected too. So I wonder if we should deal with this issue somewhat more intelligently so that the unmount does not really hang in such a case?

      Attachments

        Issue Links

          Activity

            [LU-7759] umount hanging in modern distros when OST is unavailable

            Rather than open up an old ticket that hasn't been seen in 2+ years, it is better to open up a new ticket for the new problem, and if they seem related they can be linked together. Otherwise, tracking the old re-opened issue fix version is more difficult than necessary.

            adilger Andreas Dilger added a comment - Rather than open up an old ticket that hasn't been seen in 2+ years, it is better to open up a new ticket for the new problem, and if they seem related they can be linked together. Otherwise, tracking the old re-opened issue fix version is more difficult than necessary.

            There is a new occurrence on recent test: https://testing.hpdd.intel.com/test_sets/0d7d92de-cad2-11e7-9840-52540065bddc

            The test failure looks the same but not sure if they are the same.

            jay Jinshan Xiong (Inactive) added a comment - There is a new occurrence on recent test: https://testing.hpdd.intel.com/test_sets/0d7d92de-cad2-11e7-9840-52540065bddc The test failure looks the same but not sure if they are the same.
            ys Yang Sheng added a comment -

            Patch landed. Close ticket.

            ys Yang Sheng added a comment - Patch landed. Close ticket.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19195/
            Subject: LU-7759 llite: handle inactive OSTs better in statfs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 23fde1f89bec0adf4f7181ccce5a236eac371a38

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19195/ Subject: LU-7759 llite: handle inactive OSTs better in statfs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 23fde1f89bec0adf4f7181ccce5a236eac371a38

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18820/
            Subject: LU-7759 utils: build mount.lustre with libmount
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f1de339d881958de8fc47065fb31a5c8e0c14b60

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18820/ Subject: LU-7759 utils: build mount.lustre with libmount Project: fs/lustre-release Branch: master Current Patch Set: Commit: f1de339d881958de8fc47065fb31a5c8e0c14b60

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19195
            Subject: LU-7759 llite: handle inactive OSTs better in statfs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 169a33aaf665fdaef6fc5734665a04d758a443e9

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/19195 Subject: LU-7759 llite: handle inactive OSTs better in statfs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 169a33aaf665fdaef6fc5734665a04d758a443e9

            Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/18820
            Subject: LU-7759 utils: build mount.lustre with libmount
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ffebbb3e30c247e453480f9219985effdf035b05

            gerrit Gerrit Updater added a comment - Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/18820 Subject: LU-7759 utils: build mount.lustre with libmount Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ffebbb3e30c247e453480f9219985effdf035b05
            ys Yang Sheng added a comment -

            I found a way to handle it. We can add a entry to utab(default located at /run/mount/utab) when mount a lustre filesystem. Then umount will use this entry to find out fstype to avoid invoke statfs. What we need to do just change mount_lustre.c is enough. I'll produce a patch.

            ys Yang Sheng added a comment - I found a way to handle it. We can add a entry to utab(default located at /run/mount/utab) when mount a lustre filesystem. Then umount will use this entry to find out fstype to avoid invoke statfs. What we need to do just change mount_lustre.c is enough. I'll produce a patch.

            I guess it is possible to check strstr("unmount", current->comm) and set sbi->ll_statfs |= LL_SBI_LAZYSTATFS on the filesystem, but this would add overhead to every statfs() call.

            It might be possible to set LL_SBI_LAZYSTATFS by default, but that may also cause problems with the recovery tests that wait on "df" to return to indicate recovery is complete.

            adilger Andreas Dilger added a comment - I guess it is possible to check strstr("unmount", current->comm) and set sbi->ll_statfs |= LL_SBI_LAZYSTATFS on the filesystem, but this would add overhead to every statfs() call. It might be possible to set LL_SBI_LAZYSTATFS by default, but that may also cause problems with the recovery tests that wait on "df" to return to indicate recovery is complete.
            ys Yang Sheng added a comment - - edited

            Looks like umount.lustre is called after statfs. Since umount need figure out fstype before invoke umount.{fstype}. This is why it call statfs.The hard part is no way to tell kernel the statfs called from umount. So avoid invoke statfs is only thing we can do. Of course, skip unavailable OST maybe a reasonable solution?

            ys Yang Sheng added a comment - - edited Looks like umount.lustre is called after statfs. Since umount need figure out fstype before invoke umount.{fstype}. This is why it call statfs.The hard part is no way to tell kernel the statfs called from umount. So avoid invoke statfs is only thing we can do. Of course, skip unavailable OST maybe a reasonable solution?

            People

              ys Yang Sheng
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: