[LU-8046] LFSCK does not properly check LOV_MAGIC Created: 20/Apr/16 Updated: 21/Apr/16 Resolved: 20/Apr/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | nasf (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | lfsck | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
In the LFSCK layout checking code, several places check for e.g. LOV_MAGIC_V1 and if this isn't found assume that the layout is LOV_MAGIC_V3. However, this will break when new layout types are added, like PFL, RAID-1, etc. The LFSCK code should explicitly check the lmm_magic is a known value, and if it is not known either skip it (for magics matching LOV_MAGIC_MAGIC) or consider the layout to be corrupt (for magics that don't match LOV_MAGIC_MAGIC). |
| Comments |
| Comment by nasf (Inactive) [ 20/Apr/16 ] |
|
In fact, we have already done that. As you can see in the lfsck_layout_verify_header() static int lfsck_layout_verify_header(struct lov_mds_md_v1 *lmm)
{
__u32 magic;
__u32 pattern;
magic = le32_to_cpu(lmm->lmm_magic);
/* If magic crashed, keep it there. Sometime later, during OST-object
* orphan handling, if some OST-object(s) back-point to it, it can be
* verified and repaired. */
if (magic != LOV_MAGIC_V1 && magic != LOV_MAGIC_V3) {
struct ost_id oi;
int rc;
lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi);
if ((magic & LOV_MAGIC_MASK) == LOV_MAGIC_MAGIC)
rc = -EOPNOTSUPP;
else
rc = -EINVAL;
CDEBUG(D_LFSCK, "%s LOV EA magic %u on "DOSTID"\n",
rc == -EINVAL ? "Unknown" : "Unsupported",
magic, POSTID(&oi));
return rc;
}
pattern = le32_to_cpu(lmm->lmm_pattern);
/* XXX: currently, we only support LOV_PATTERN_RAID0. */
if (lov_pattern(pattern) != LOV_PATTERN_RAID0) {
struct ost_id oi;
lmm_oi_le_to_cpu(&oi, &lmm->lmm_oi);
CDEBUG(D_LFSCK, "Unsupported LOV EA pattern %u on "DOSTID"\n",
pattern, POSTID(&oi));
return -EOPNOTSUPP;
}
return 0;
}
Before the caller handling the LOV EA entries, it will call lfsck_layout_verify_header() to check the magic firstly. Currently, only _V1 and _V3 are recognised. If the magic is unknown, and if it matches LOV_MAGIC_MAGIC, then keep it there and skip; otherwise, the LFSCK will regard the LOV EA corrupted and rebuild it from OST-object(s). |
| Comment by nasf (Inactive) [ 20/Apr/16 ] |
|
The issue has already been resolved. |
| Comment by Andreas Dilger [ 20/Apr/16 ] |
|
If this is the case for older Lustre versions, there is no reason to even have LMAI_PFL set on the file I think? The client and MDS will not understand the LOV_MAGIC_COMP and ignore the file, but such a file should be allowed to be deleted with an old server, even if it means the OST objects are lost (LFSCK will clean them up). Having LMAI_PFL set on the file means it cannot even be deleted I think. |
| Comment by nasf (Inactive) [ 21/Apr/16 ] |
|
What relationship with the LMAI_PFL flag? As my understand, if the old server does not recognise the new flag LMAI_PFL, it should skip the file directly. So means the old server cannot remove PFL file. But if some reason caused LMAI_PFL not set, the old server still cannot remove the PFL file because it will fail to do that during handling the LOV EA with unknown LOV_MAGIC_COMP magic. So what I can do for this ticket? Sorry, I am some confused. |