Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8046

LFSCK does not properly check LOV_MAGIC

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      In the LFSCK layout checking code, several places check for e.g. LOV_MAGIC_V1 and if this isn't found assume that the layout is LOV_MAGIC_V3. However, this will break when new layout types are added, like PFL, RAID-1, etc.

      The LFSCK code should explicitly check the lmm_magic is a known value, and if it is not known either skip it (for magics matching LOV_MAGIC_MAGIC) or consider the layout to be corrupt (for magics that don't match LOV_MAGIC_MAGIC).

      Attachments

        Activity

          [LU-8046] LFSCK does not properly check LOV_MAGIC
          yong.fan nasf (Inactive) added a comment - - edited

          What relationship with the LMAI_PFL flag? As my understand, if the old server does not recognise the new flag LMAI_PFL, it should skip the file directly. So means the old server cannot remove PFL file. But if some reason caused LMAI_PFL not set, the old server still cannot remove the PFL file because it will fail to do that during handling the LOV EA with unknown LOV_MAGIC_COMP magic. So what I can do for this ticket? Sorry, I am some confused.

          yong.fan nasf (Inactive) added a comment - - edited What relationship with the LMAI_PFL flag? As my understand, if the old server does not recognise the new flag LMAI_PFL, it should skip the file directly. So means the old server cannot remove PFL file. But if some reason caused LMAI_PFL not set, the old server still cannot remove the PFL file because it will fail to do that during handling the LOV EA with unknown LOV_MAGIC_COMP magic. So what I can do for this ticket? Sorry, I am some confused.

          If this is the case for older Lustre versions, there is no reason to even have LMAI_PFL set on the file I think? The client and MDS will not understand the LOV_MAGIC_COMP and ignore the file, but such a file should be allowed to be deleted with an old server, even if it means the OST objects are lost (LFSCK will clean them up). Having LMAI_PFL set on the file means it cannot even be deleted I think.

          adilger Andreas Dilger added a comment - If this is the case for older Lustre versions, there is no reason to even have LMAI_PFL set on the file I think? The client and MDS will not understand the LOV_MAGIC_COMP and ignore the file, but such a file should be allowed to be deleted with an old server, even if it means the OST objects are lost (LFSCK will clean them up). Having LMAI_PFL set on the file means it cannot even be deleted I think.

          The issue has already been resolved.

          yong.fan nasf (Inactive) added a comment - The issue has already been resolved.

          In fact, we have already done that. As you can see in the lfsck_layout_verify_header()

          static int lfsck_layout_verify_header(struct lov_mds_md_v1 *lmm)
          {
                  __u32 magic;
                  __u32 pattern;
          
                  magic = le32_to_cpu(lmm->lmm_magic);
                  /* If magic crashed, keep it there. Sometime later, during OST-object
                   * orphan handling, if some OST-object(s) back-point to it, it can be
                   * verified and repaired. */
                  if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3) {
                          struct ost_id   oi;
                          int             rc;
          
                          lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi);
                          if ((magic & LOV_MAGIC_MASK) == LOV_MAGIC_MAGIC)
                                  rc = -EOPNOTSUPP;
                          else
                                  rc = -EINVAL;
          
                          CDEBUG(D_LFSCK, "%s LOV EA magic %u on "DOSTID"\n",
                                 rc == -EINVAL ? "Unknown" : "Unsupported",
                                 magic, POSTID(&oi));
          
                          return rc;
                  }
          
                  pattern = le32_to_cpu(lmm->lmm_pattern);
                  /* XXX: currently, we only support LOV_PATTERN_RAID0. */
                  if (lov_pattern(pattern) != LOV_PATTERN_RAID0) {
                          struct ost_id oi;
          
                          lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi);
                          CDEBUG(D_LFSCK, "Unsupported LOV EA pattern %u on "DOSTID"\n",
                                 pattern, POSTID(&oi));
          
                          return -EOPNOTSUPP;
                  }
          
                  return 0;
          }
          

          Before the caller handling the LOV EA entries, it will call lfsck_layout_verify_header() to check the magic firstly. Currently, only _V1 and _V3 are recognised. If the magic is unknown, and if it matches LOV_MAGIC_MAGIC, then keep it there and skip; otherwise, the LFSCK will regard the LOV EA corrupted and rebuild it from OST-object(s).

          yong.fan nasf (Inactive) added a comment - In fact, we have already done that. As you can see in the lfsck_layout_verify_header() static int lfsck_layout_verify_header(struct lov_mds_md_v1 *lmm) { __u32 magic; __u32 pattern; magic = le32_to_cpu(lmm->lmm_magic); /* If magic crashed, keep it there. Sometime later, during OST-object * orphan handling, if some OST-object(s) back-point to it, it can be * verified and repaired. */ if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3) { struct ost_id oi; int rc; lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi); if ((magic & LOV_MAGIC_MASK) == LOV_MAGIC_MAGIC) rc = -EOPNOTSUPP; else rc = -EINVAL; CDEBUG(D_LFSCK, "%s LOV EA magic %u on "DOSTID"\n", rc == -EINVAL ? "Unknown" : "Unsupported", magic, POSTID(&oi)); return rc; } pattern = le32_to_cpu(lmm->lmm_pattern); /* XXX: currently, we only support LOV_PATTERN_RAID0. */ if (lov_pattern(pattern) != LOV_PATTERN_RAID0) { struct ost_id oi; lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi); CDEBUG(D_LFSCK, "Unsupported LOV EA pattern %u on "DOSTID"\n", pattern, POSTID(&oi)); return -EOPNOTSUPP; } return 0; } Before the caller handling the LOV EA entries, it will call lfsck_layout_verify_header() to check the magic firstly. Currently, only _V1 and _V3 are recognised. If the magic is unknown, and if it matches LOV_MAGIC_MAGIC, then keep it there and skip; otherwise, the LFSCK will regard the LOV EA corrupted and rebuild it from OST-object(s).

          People

            yong.fan nasf (Inactive)
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: