[LU-13070] mdd_orphan_destroy loop caused by compatibility issue on upgrades to 2.11 or later Created: 12/Dec/19 Updated: 21/Jan/20 Resolved: 03/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | Artem Blagodarenko (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
While investigating of the customer issue, we found that the original trigger for the problem is a compatibility issue between Lustre 2.11 and older Lustre versions. Code introduced by Here is the relevant code segment from mdd_orphan_delete():
rc = dt_delete(env, mdd->mdd_orphans, key, th);
if (rc == -ENOENT) {
key = mdd_orphan_key_fill_20(env, mdo2fid(obj));
rc = dt_delete(env, mdd->mdd_orphans, key, th);
}
This same ENOENT sequence should be included in mdd_orphan_destroy(). It looks like |
| Comments |
| Comment by Gerrit Updater [ 17/Dec/19 ] |
|
Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/37049 |
| Comment by Gerrit Updater [ 03/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37049/ |
| Comment by Peter Jones [ 03/Jan/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 03/Jan/20 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37129 |
| Comment by Gerrit Updater [ 10/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37129/ |
| Comment by Yang Sheng [ 18/Jan/20 ] |
|
Hi, Artem, After reconsider, The mdd_orphan_key_fill_20 was used in mdd_orphan_delete since the key comes from mdd_orphan_key_fill, So it exists compatibility issue. But the key in mdd_orphan_destroy comes from osd iterator, not generated from fid. So the compatibility issue is not exists at all. Doesn't it? Thanks, |
| Comment by Artem Blagodarenko (Inactive) [ 21/Jan/20 ] |
|
Hello Yang, The same situation in the mdd_orphan_destroy(). The mdd_orphan_key_fill() is executed in mdd_orphan_destroy->mdd_orphan_declare_delete() code path. And then this "filled" name is used. The first symptom of this issue we noticed was the message: [Mon Nov 25 10:01:42 2019] LustreError: 112248:0:(mdd_orphans.c:324:mdd_orphan_destroy()) snx11168-MDD0003: could not delete orphan [0x2c012648d:0x579f:0x0]: rc = -2 I believe the reason that ENOENT returned is the wrong fid is parsed from the filename because of filename in old format and mdd_orphan_key_fill_20() needs to be used. Best regards, Artem Blagodarenko.
|
| Comment by Yang Sheng [ 21/Jan/20 ] |
|
Hi, Artem, Please note that the key in mdd_orphan_destroy comes from ent->lde_name in mdd_orphan_index_iterate->mdd_orphan_key_test_and_delete. The mdd_orphan_declare_delete only fill env->mti_key. So mdd_orphan_key_fill_20 is useless in there. Thanks, |
| Comment by Artem Blagodarenko (Inactive) [ 21/Jan/20 ] |
|
Hello Yang,
You are right. The key is come from directory names and should be right. We can delete this useless code. Do you think we need to fix mdd_orphan_declare_delete() and add compatability mdd_orphan_key_fill() version call there?
Thanks, Artem Blagodarenko. |
| Comment by Yang Sheng [ 21/Jan/20 ] |
|
Hi, Artem, Looks like the key in mdd_orphan_declare_delete is only useful in zfs to check for '.' or '..' case. So we needn't fix it. Thanks, |