Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14399

mount MDT takes very long with hsm enable

Details

    • 3
    • 9223372036854775807

    Description

      We observed that when mounting MDT with HSM enable, mount command take minutes compare to seconds as before. We saw this in the log

      [53618.238941] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre -o localrecov  /dev/mapper/mds2_flakey /mnt/lustre-mds2
      [53618.624098] LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      [53720.390690] Lustre: 1722736:0:(mdt_coordinator.c:1114:mdt_hsm_cdt_start()) lustre-MDT0001: trying to init HSM before MDD
      [53720.392834] LustreError: 1722736:0:(mdt_coordinator.c:1125:mdt_hsm_cdt_start()) lustre-MDT0001: cannot take the layout locks needed for registered restore: -2
      [53720.398049] LustreError: 1722741:0:(mdt_coordinator.c:1090:mdt_hsm_cdt_start()) lustre-MDT0001: Coordinator already started or stopping
      [53720.400681] Lustre: lustre-MDT0001: Imperative Recovery not enabled, recovery window 60-180
      [53720.424872] Lustre: lustre-MDT0001: in recovery but waiting for the first client to connect
      [53720.953893] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
      [53722.067555] Lustre: DEBUG MARKER:  
      

      Seems related to LU-13920

      Attachments

        Issue Links

          Activity

            [LU-14399] mount MDT takes very long with hsm enable

            Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/42005
            Subject: LU-14399 tests: hsm_actions after failover
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 87493ef365d9faaf1f6c1e1a40f65157d37f72dc

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/42005 Subject: LU-14399 tests: hsm_actions after failover Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 87493ef365d9faaf1f6c1e1a40f65157d37f72dc

            Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/41445
            Subject: LU-14399 hsm: process hsm_actions in coordinator
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ec9b9b8d2e05568f75ca75e596585503ae0d4216

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (sergey.cheremencev@hpe.com) uploaded a new patch: https://review.whamcloud.com/41445 Subject: LU-14399 hsm: process hsm_actions in coordinator Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ec9b9b8d2e05568f75ca75e596585503ae0d4216

            here is the sequence that I did to hit this bug

            Thank you. Now the problem is clear.

            I'll provide a fix in a few days.

            scherementsev Sergey Cheremencev added a comment - here is the sequence that I did to hit this bug Thank you. Now the problem is clear. I'll provide a fix in a few days.
            mdiep Minh Diep added a comment -

            here is the sequence that I did to hit this bug

            1. mkfs.lustre --mdt --mgsnode ....
            2. tunefs.lustre --param mdt.hsm_control=enabled ...
            3. mount -t lustre ...
            mdiep Minh Diep added a comment - here is the sequence that I did to hit this bug mkfs.lustre --mdt --mgsnode .... tunefs.lustre --param mdt.hsm_control=enabled ... mount -t lustre ...

            John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41415
            Subject: LU-14399: Revert "LU-13651 hsm: call hsm_find_compatible_cb() only for cancel"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4ed2cd37d3331511ffff1fcd2bbe53c9b2513502

            gerrit Gerrit Updater added a comment - John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41415 Subject: LU-14399 : Revert " LU-13651 hsm: call hsm_find_compatible_cb() only for cancel" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4ed2cd37d3331511ffff1fcd2bbe53c9b2513502
                    /* wait until MDD initialize hsm actions llog */
                    while (!test_bit(MDT_FL_CFGLOG, &mdt->mdt_state) && i < obd_timeout) {
                            schedule_timeout_interruptible(cfs_time_seconds(1));
                            i++;
                    }
                    if (!test_bit(MDT_FL_CFGLOG, &mdt->mdt_state))
                            CWARN("%s: trying to init HSM before MDD\n", mdt_obd_name(mdt));

            LU-13920 just waits until MDT_FL_CFGLOG will be set.
            This flag is set at the end of mdt_prepare. There is a chance mdt_prepare was stuck for some reasons.
            Anyway I need debug logs from MDT to say something.
            Is it possible to reproduce it again and gather debug logs?

            scherementsev Sergey Cheremencev added a comment -         /* wait until MDD initialize hsm actions llog */         while (!test_bit(MDT_FL_CFGLOG, &mdt->mdt_state) && i < obd_timeout) {                 schedule_timeout_interruptible(cfs_time_seconds(1));                 i++;         }         if (!test_bit(MDT_FL_CFGLOG, &mdt->mdt_state))                 CWARN("%s: trying to init HSM before MDD\n", mdt_obd_name(mdt)); LU-13920 just waits until MDT_FL_CFGLOG will be set. This flag is set at the end of mdt_prepare. There is a chance mdt_prepare was stuck for some reasons. Anyway I need debug logs from MDT to say something. Is it possible to reproduce it again and gather debug logs?

            John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41409
            Subject: LU-14399 Revert "LU-13920 hsm: process hsm_actions only after mdd setup"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5b375e90e3cc3538f4dd92dc81f98fcf5b98e41e

            gerrit Gerrit Updater added a comment - John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41409 Subject: LU-14399 Revert " LU-13920 hsm: process hsm_actions only after mdd setup" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5b375e90e3cc3538f4dd92dc81f98fcf5b98e41e

            People

              scherementsev Sergey Cheremencev
              mdiep Minh Diep
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: