[LU-8046] LFSCK does not properly check LOV_MAGIC Created: 20/Apr/16  Updated: 21/Apr/16  Resolved: 20/Apr/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Not a Bug Votes: 0
Labels: lfsck

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In the LFSCK layout checking code, several places check for e.g. LOV_MAGIC_V1 and if this isn't found assume that the layout is LOV_MAGIC_V3. However, this will break when new layout types are added, like PFL, RAID-1, etc.

The LFSCK code should explicitly check the lmm_magic is a known value, and if it is not known either skip it (for magics matching LOV_MAGIC_MAGIC) or consider the layout to be corrupt (for magics that don't match LOV_MAGIC_MAGIC).



 Comments   
Comment by nasf (Inactive) [ 20/Apr/16 ]

In fact, we have already done that. As you can see in the lfsck_layout_verify_header()

static int lfsck_layout_verify_header(struct lov_mds_md_v1 *lmm)
{
        __u32 magic;
        __u32 pattern;

        magic = le32_to_cpu(lmm->lmm_magic);
        /* If magic crashed, keep it there. Sometime later, during OST-object
         * orphan handling, if some OST-object(s) back-point to it, it can be
         * verified and repaired. */
        if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3) {
                struct ost_id   oi;
                int             rc;

                lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi);
                if ((magic & LOV_MAGIC_MASK) == LOV_MAGIC_MAGIC)
                        rc = -EOPNOTSUPP;
                else
                        rc = -EINVAL;

                CDEBUG(D_LFSCK, "%s LOV EA magic %u on "DOSTID"\n",
                       rc == -EINVAL ? "Unknown" : "Unsupported",
                       magic, POSTID(&oi));

                return rc;
        }

        pattern = le32_to_cpu(lmm->lmm_pattern);
        /* XXX: currently, we only support LOV_PATTERN_RAID0. */
        if (lov_pattern(pattern) != LOV_PATTERN_RAID0) {
                struct ost_id oi;

                lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi);
                CDEBUG(D_LFSCK, "Unsupported LOV EA pattern %u on "DOSTID"\n",
                       pattern, POSTID(&oi));

                return -EOPNOTSUPP;
        }

        return 0;
}

Before the caller handling the LOV EA entries, it will call lfsck_layout_verify_header() to check the magic firstly. Currently, only _V1 and _V3 are recognised. If the magic is unknown, and if it matches LOV_MAGIC_MAGIC, then keep it there and skip; otherwise, the LFSCK will regard the LOV EA corrupted and rebuild it from OST-object(s).

Comment by nasf (Inactive) [ 20/Apr/16 ]

The issue has already been resolved.

Comment by Andreas Dilger [ 20/Apr/16 ]

If this is the case for older Lustre versions, there is no reason to even have LMAI_PFL set on the file I think? The client and MDS will not understand the LOV_MAGIC_COMP and ignore the file, but such a file should be allowed to be deleted with an old server, even if it means the OST objects are lost (LFSCK will clean them up). Having LMAI_PFL set on the file means it cannot even be deleted I think.

Comment by nasf (Inactive) [ 21/Apr/16 ]

What relationship with the LMAI_PFL flag? As my understand, if the old server does not recognise the new flag LMAI_PFL, it should skip the file directly. So means the old server cannot remove PFL file. But if some reason caused LMAI_PFL not set, the old server still cannot remove the PFL file because it will fail to do that during handling the LOV EA with unknown LOV_MAGIC_COMP magic. So what I can do for this ticket? Sorry, I am some confused.

Generated at Sat Feb 10 02:14:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.