Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3934

Directories gone missing after 2.4 update

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0, Lustre 2.4.2
    • Lustre 2.4.1
    • lustre 2.4.0-17chaos (github.com/chaos/lustre)
    • 3
    • 10401

    Description

      After upgrade of our servers from 2.1 to 2.4, our MDS crashed on LU-2842, and we applied the patch. That patch avoided the LBUG, but now it is clear that there is a more basic problem that we can no longer look up a bunch of the top-level subdirectories in this lustre filesystem.

      We are seeing problems like:

      2013-09-11 13:01:22 LustreError: 5570:0:(mdt_open.c:1687:mdt_reint_open()) lsc-MDT0000: name purgelogs present, but fid [0x2830891e:0xd1781321:0x0] invalid

      It looks to me like the directory entries are still there, but FID lookups do not work on them. We verified that the directory named "purgelogs" appears on the underlying ldiskfs filesystem at ROOT/purgelogs.

      We also see error messages diring recovery shortly after the recent boot like the following:

      2013-09-11 12:58:27 sumom-mds1 login: LustreError: 4164:0:(mdt_open.c:1497:mdt_reint_open()) @@@ [0x24d18001:0x3db440f0:0x0]/XXXXXX->[0x24d98604:0
      x2a32454:0x0] cr_flags=0104200200001 mode=0200100000 msg_flag=0x4 not found in open replay.  req@ffff8808263d1000 x1443453865661288/t0(46385661850
      2) o101->f45d6fab-2c9c-6b39-0090-4935fbe03e32@192.168.115.87@o2ib10:0/0 lens 568/1176 e 0 to 0 dl 1378929568 ref 1 fl Interpret:/4/0 rc 0/0

      (I X'ed out the user name there, but everything else is cut-and-paste.)

      Any ideas on the next step to get these directories accessible again?

      Attachments

        Issue Links

          Activity

            [LU-3934] Directories gone missing after 2.4 update
            pjones Peter Jones added a comment -

            Closing as LLNL have pulled the fix(es) into their release and the fix is landed for 2.5.0

            pjones Peter Jones added a comment - Closing as LLNL have pulled the fix(es) into their release and the fix is landed for 2.5.0

            6515 has been on b2_4 already, but not on b2_4_0, so you need to backport 6515 to b2_4_0, then apply 7625.

            yong.fan nasf (Inactive) added a comment - 6515 has been on b2_4 already, but not on b2_4_0, so you need to backport 6515 to b2_4_0, then apply 7625.

            http://review.whamcloud.com/#/c/6515/ was also landed on b2_4, and you therefore based http://review.whamcloud.com/#/c/7625/ on that. 6515 does not apply cleanly without 7625. I'll just take both.

            morrone Christopher Morrone (Inactive) added a comment - http://review.whamcloud.com/#/c/6515/ was also landed on b2_4, and you therefore based http://review.whamcloud.com/#/c/7625/ on that. 6515 does not apply cleanly without 7625. I'll just take both.

            Firstly, you need this patch (http://review.whamcloud.com/#/c/7625/) on Lustre-2.4 to resolve LU-3934.

            Then, if possible, please consider the patch (http://review.whamcloud.com/#/c/6515/) also, which mainly focus on triggering OI scrub properly under DNE mode. The patch is based on master (Lustre-2.5). I am not sure whether it can be applied on your patches stack directly or not, please try. If cannot, we can back-port.

            yong.fan nasf (Inactive) added a comment - Firstly, you need this patch ( http://review.whamcloud.com/#/c/7625/ ) on Lustre-2.4 to resolve LU-3934 . Then, if possible, please consider the patch ( http://review.whamcloud.com/#/c/6515/ ) also, which mainly focus on triggering OI scrub properly under DNE mode. The patch is based on master (Lustre-2.5). I am not sure whether it can be applied on your patches stack directly or not, please try. If cannot, we can back-port.

            It looks like the 2.4 patch assumes the existence of this patch:

            448a0fb 2013-08-08 LU-3420 scrub: trigger OI scrub properly

            which did not exists at 2.4.0. I assume that you suggest that I cherry pick that as well?

            morrone Christopher Morrone (Inactive) added a comment - It looks like the 2.4 patch assumes the existence of this patch: 448a0fb 2013-08-08 LU-3420 scrub: trigger OI scrub properly which did not exists at 2.4.0. I assume that you suggest that I cherry pick that as well?

            Patch landed to Master so closing ticket. Please let me know if anything additional is needed and I will reopen

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master so closing ticket. Please let me know if anything additional is needed and I will reopen

            The patch for master to detect the upgrading:

            http://review.whamcloud.com/#/c/7719/

            yong.fan nasf (Inactive) added a comment - The patch for master to detect the upgrading: http://review.whamcloud.com/#/c/7719/

            People

              yong.fan nasf (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: