Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      As mentioned in LU-6683, I ran into a situation where lctl lfsck_stop just hangs indefinitely.

      I have managed to reproduce this twice:

      start lfsck (using lctl lfsck_start -M play01-MDT0000 -t layout), this crashes the OSS servers, reboot the servers and restart the OSTs. Attempting to stop the lfsck in this state just hangs. I have waited >1h and it was still hanging. Unmounting the MDT in this situation also appears to be hanging (after 30 minutes I power cycled the MDS).

      Attachments

        1. 15.lctl.tgz
          631 kB
        2. lustre.dmesg.bz2
          37 kB
        3. lustre.log.bz2
          1.38 MB

        Issue Links

          Activity

            [LU-6684] lctl lfsck_stop hangs

            Another instance found for tag 2.7.66 for Full - EL6.7 Server/EL6.7 Client
            On master, build# 3314
            https://testing.hpdd.intel.com/test_sets/35490a0c-ca6e-11e5-9215-5254006e85c2
            Date : 02/02/2016 Time: 9:20 am MST

            standan Saurabh Tandan (Inactive) added a comment - Another instance found for tag 2.7.66 for Full - EL6.7 Server/EL6.7 Client On master, build# 3314 https://testing.hpdd.intel.com/test_sets/35490a0c-ca6e-11e5-9215-5254006e85c2 Date : 02/02/2016 Time: 9:20 am MST

            The patch has been landed to master.

            yong.fan nasf (Inactive) added a comment - The patch has been landed to master.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18082/
            Subject: LU-6684 lfsck: set the lfsck notify as interruptable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 069a9cf551c2e985ea254a1c570b22ed1d72d914

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18082/ Subject: LU-6684 lfsck: set the lfsck notify as interruptable Project: fs/lustre-release Branch: master Current Patch Set: Commit: 069a9cf551c2e985ea254a1c570b22ed1d72d914
            yujian Jian Yu added a comment - This is blocking patch review testing on master branch: https://testing.hpdd.intel.com/test_sets/a29caebe-c709-11e5-9b6d-5254006e85c2 https://testing.hpdd.intel.com/test_sets/fbfee2be-c70f-11e5-a037-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/150c07e2-c575-11e5-825e-5254006e85c2

            This is also delaying the landing of several patches.

            simmonsja James A Simmons added a comment - This is also delaying the landing of several patches.
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/85d45ece-c0bc-11e5-9620-5254006e85c2

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/18082
            Subject: LU-6684 lfsck: set the lfsck notify as interruptable
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 68c078328be253735658fcf43fa98afff936ec6c

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/18082 Subject: LU-6684 lfsck: set the lfsck notify as interruptable Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 68c078328be253735658fcf43fa98afff936ec6c

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/18059
            Subject: Revert "LU-6684 lfsck: stop lfsck even if some servers offline"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2505fd07b29ebfddcd29f16954908f6fe4670276

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/18059 Subject: Revert " LU-6684 lfsck: stop lfsck even if some servers offline" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2505fd07b29ebfddcd29f16954908f6fe4670276
            jamesanunez James Nunez (Inactive) added a comment - - edited

            More failures on master and all have the previous patch landed for this ticket:
            2016-01-15 15:29:21 - https://testing.hpdd.intel.com/test_sets/48126330-bbce-11e5-8506-5254006e85c2
            2016-01-15 20:20:20 - https://testing.hpdd.intel.com/test_sets/7ec04c5e-bbfa-11e5-acbb-5254006e85c2
            2016-01-16 00:40:11 - https://testing.hpdd.intel.com/test_sets/4988556c-bc05-11e5-8f65-5254006e85c2
            2016-01-18 22:08:02 - https://testing.hpdd.intel.com/test_sets/3a54dfd8-be63-11e5-92e8-5254006e85c2
            2016-01-18 22:59:29 - https://testing.hpdd.intel.com/test_sets/642d055a-be69-11e5-92e8-5254006e85c2
            2016-01-18 23:21:01 - https://testing.hpdd.intel.com/test_sets/c75e157e-be6e-11e5-b113-5254006e85c2
            2016-01-19 07:37:19 - https://testing.hpdd.intel.com/test_sets/325db7ae-beb4-11e5-8c8a-5254006e85c2
            2016-01-19 12:10:06 - https://testing.hpdd.intel.com/test_sets/144d9d36-bed9-11e5-ad7e-5254006e85c2
            2016-01-19 22:11:45 - https://testing.hpdd.intel.com/test_sets/a2f0fede-bf2e-11e5-a659-5254006e85c2
            2016-01-19 22:26:33 - https://testing.hpdd.intel.com/test_sets/dc0ed974-bf2f-11e5-8f04-5254006e85c2
            2016-01-19 23:59:17 - https://testing.hpdd.intel.com/test_sets/01d6b960-bf3f-11e5-8f04-5254006e85c2
            2016-01-21 11:12:25 - https://testing.hpdd.intel.com/test_sets/cd343b46-c061-11e5-a8e5-5254006e85c2
            2016-01-21 13:03:12 - https://testing.hpdd.intel.com/test_sets/c0b04f0e-c070-11e5-956d-5254006e85c2
            2016-01-21 14:41:59 - https://testing.hpdd.intel.com/test_sets/e4cffce0-c07f-11e5-a8e5-5254006e85c2
            2016-01-21 21:40:44 - https://testing.hpdd.intel.com/test_sets/85d45ece-c0bc-11e5-9620-5254006e85c2
            2016-01-22 03:45:40 - https://testing.hpdd.intel.com/test_sets/abf22bb0-c0ec-11e5-8d88-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - - edited More failures on master and all have the previous patch landed for this ticket: 2016-01-15 15:29:21 - https://testing.hpdd.intel.com/test_sets/48126330-bbce-11e5-8506-5254006e85c2 2016-01-15 20:20:20 - https://testing.hpdd.intel.com/test_sets/7ec04c5e-bbfa-11e5-acbb-5254006e85c2 2016-01-16 00:40:11 - https://testing.hpdd.intel.com/test_sets/4988556c-bc05-11e5-8f65-5254006e85c2 2016-01-18 22:08:02 - https://testing.hpdd.intel.com/test_sets/3a54dfd8-be63-11e5-92e8-5254006e85c2 2016-01-18 22:59:29 - https://testing.hpdd.intel.com/test_sets/642d055a-be69-11e5-92e8-5254006e85c2 2016-01-18 23:21:01 - https://testing.hpdd.intel.com/test_sets/c75e157e-be6e-11e5-b113-5254006e85c2 2016-01-19 07:37:19 - https://testing.hpdd.intel.com/test_sets/325db7ae-beb4-11e5-8c8a-5254006e85c2 2016-01-19 12:10:06 - https://testing.hpdd.intel.com/test_sets/144d9d36-bed9-11e5-ad7e-5254006e85c2 2016-01-19 22:11:45 - https://testing.hpdd.intel.com/test_sets/a2f0fede-bf2e-11e5-a659-5254006e85c2 2016-01-19 22:26:33 - https://testing.hpdd.intel.com/test_sets/dc0ed974-bf2f-11e5-8f04-5254006e85c2 2016-01-19 23:59:17 - https://testing.hpdd.intel.com/test_sets/01d6b960-bf3f-11e5-8f04-5254006e85c2 2016-01-21 11:12:25 - https://testing.hpdd.intel.com/test_sets/cd343b46-c061-11e5-a8e5-5254006e85c2 2016-01-21 13:03:12 - https://testing.hpdd.intel.com/test_sets/c0b04f0e-c070-11e5-956d-5254006e85c2 2016-01-21 14:41:59 - https://testing.hpdd.intel.com/test_sets/e4cffce0-c07f-11e5-a8e5-5254006e85c2 2016-01-21 21:40:44 - https://testing.hpdd.intel.com/test_sets/85d45ece-c0bc-11e5-9620-5254006e85c2 2016-01-22 03:45:40 - https://testing.hpdd.intel.com/test_sets/abf22bb0-c0ec-11e5-8d88-5254006e85c2

            And I verified that these two failures are on commits that include the fix that was recently landed here.

            adilger Andreas Dilger added a comment - And I verified that these two failures are on commits that include the fix that was recently landed here.

            People

              yong.fan nasf (Inactive)
              ferner Frederik Ferner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: