Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13070

mdd_orphan_destroy loop caused by compatibility issue on upgrades to 2.11 or later

Details

    • 3
    • 9223372036854775807

    Description

      While investigating of the customer issue, we found that the original trigger for the problem is a compatibility issue between Lustre 2.11 and older Lustre versions. Code introduced by LU-7787 to "clean up orphan object handling" was incomplete. The format for names of orphans in the PENDING dir was changed in Lustre 2.11. The old format names are not recognized by mdd_orphan_destroy() in Lustre 2.11, leading to an endless loop. There's a check for the old format name, used in mdd_orphan_delete(), but that check was not included in mdd_orphan_destroy().

      Here is the relevant code segment from mdd_orphan_delete():

      rc = dt_delete(env, mdd->mdd_orphans, key, th);
       if (rc == -ENOENT) {
           key = mdd_orphan_key_fill_20(env, mdo2fid(obj));
           rc = dt_delete(env, mdd->mdd_orphans, key, th);
       }  

      This same ENOENT sequence should be included in mdd_orphan_destroy().

      It looks like LU-11418 trying to solve the problem, but it removes symptoms, not the root cause.

      Attachments

        Issue Links

          Activity

            [LU-13070] mdd_orphan_destroy loop caused by compatibility issue on upgrades to 2.11 or later
            ys Yang Sheng added a comment -

            Hi, Artem,

            Looks like the key in mdd_orphan_declare_delete is only useful in zfs to check for '.' or '..' case. So we needn't fix it.

            Thanks,
            YangSheng

            ys Yang Sheng added a comment - Hi, Artem, Looks like the key in mdd_orphan_declare_delete is only useful in zfs to check for '.' or '..' case. So we needn't fix it. Thanks, YangSheng

            Hello Yang,

             

            You are right.  The key is come from directory names and should be right. We can delete this useless code.

            Do you think we need to fix mdd_orphan_declare_delete() and add compatability mdd_orphan_key_fill() version call there?

             

            Thanks,

            Artem Blagodarenko.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Hello Yang,   You are right.  The key is come from directory names and should be right. We can delete this useless code. Do you think we need to fix mdd_orphan_declare_delete() and add compatability mdd_orphan_key_fill() version call there?   Thanks, Artem Blagodarenko.
            ys Yang Sheng added a comment -

            Hi, Artem,

            Please note that the key in mdd_orphan_destroy comes from ent->lde_name in mdd_orphan_index_iterate->mdd_orphan_key_test_and_delete. The mdd_orphan_declare_delete only fill env->mti_key. So mdd_orphan_key_fill_20 is useless in there.

            Thanks,
            YangSheng

            ys Yang Sheng added a comment - Hi, Artem, Please note that the key in mdd_orphan_destroy comes from ent->lde_name in mdd_orphan_index_iterate->mdd_orphan_key_test_and_delete. The mdd_orphan_declare_delete only fill env->mti_key. So mdd_orphan_key_fill_20 is useless in there. Thanks, YangSheng

            Hello Yang,

            The same situation in the mdd_orphan_destroy(). The  mdd_orphan_key_fill() is executed in mdd_orphan_destroy->mdd_orphan_declare_delete() code path. And then this "filled" name is used.

            The first symptom of this issue we noticed was the message:

             [Mon Nov 25 10:01:42 2019] LustreError: 112248:0:(mdd_orphans.c:324:mdd_orphan_destroy()) snx11168-MDD0003: could not delete orphan [0x2c012648d:0x579f:0x0]: rc = -2

            I believe the reason that ENOENT returned is the wrong fid is parsed from the filename because of filename in old format and

            mdd_orphan_key_fill_20() needs to be used.

            Best regards,

            Artem Blagodarenko.

             

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Hello Yang, The same situation in the mdd_orphan_destroy(). The  mdd_orphan_key_fill() is executed in mdd_orphan_destroy->mdd_orphan_declare_delete() code path. And then this "filled" name is used. The first symptom of this issue we noticed was the message: [Mon Nov 25 10:01:42 2019] LustreError: 112248:0:(mdd_orphans.c:324:mdd_orphan_destroy()) snx11168-MDD0003: could not delete orphan [0x2c012648d:0x579f:0x0]: rc = -2 I believe the reason that ENOENT returned is the wrong fid is parsed from the filename because of filename in old format and mdd_orphan_key_fill_20() needs to be used. Best regards, Artem Blagodarenko.  
            ys Yang Sheng added a comment -

            Hi, Artem,

            After reconsider, The mdd_orphan_key_fill_20 was used in mdd_orphan_delete since the key comes from mdd_orphan_key_fill, So it exists compatibility issue. But the key in mdd_orphan_destroy comes from osd iterator, not generated from fid. So the compatibility issue is not exists at all. Doesn't it?

            Thanks,
            YangSheng

            ys Yang Sheng added a comment - Hi, Artem, After reconsider, The mdd_orphan_key_fill_20 was used in mdd_orphan_delete since the key comes from mdd_orphan_key_fill, So it exists compatibility issue. But the key in mdd_orphan_destroy comes from osd iterator, not generated from fid. So the compatibility issue is not exists at all. Doesn't it? Thanks, YangSheng

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37129/
            Subject: LU-13070 mdd: try old format for orphan names during recovery
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: b75f04d5855d7ac4de98fe89686ae685c19c2f97

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37129/ Subject: LU-13070 mdd: try old format for orphan names during recovery Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: b75f04d5855d7ac4de98fe89686ae685c19c2f97

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37129
            Subject: LU-13070 mdd: try old format for orphan names during recovery
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 774e8858dd86d265a3c2a0f8b6efdb93e2d82d11

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37129 Subject: LU-13070 mdd: try old format for orphan names during recovery Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 774e8858dd86d265a3c2a0f8b6efdb93e2d82d11
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37049/
            Subject: LU-13070 mdd: try old format for orphan names during recovery
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 05fca4be33067f24a02e527c88cff5b60a20bb39

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37049/ Subject: LU-13070 mdd: try old format for orphan names during recovery Project: fs/lustre-release Branch: master Current Patch Set: Commit: 05fca4be33067f24a02e527c88cff5b60a20bb39

            Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/37049
            Subject: LU-13070 mdd: try old format for orphan names during recovery
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: aec5021399cda37e732a30d0981db7ecd6e86444

            gerrit Gerrit Updater added a comment - Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/37049 Subject: LU-13070 mdd: try old format for orphan names during recovery Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: aec5021399cda37e732a30d0981db7ecd6e86444

            People

              artem_blagodarenko Artem Blagodarenko (Inactive)
              artem_blagodarenko Artem Blagodarenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: