Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8533

Problems with ldiskfs patch series selection logic

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.7.0, Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      LU-5276 (commit 995921a5c1e53b4d717de21dd3210406f6833689) introduced a change in the ldiskfs series selection logic. Ostensibly the reason for this was:

      LU-5276 build: handle RHEL ldiskfs series more accurated

      Since RHEL7 change RHEL_RELEASE macro format. So we need
      a unified way to handle it. Change using RHEL_MAJOR &
      RHEL_MINOR to decided which ldiskfs series to be choose.

      The mentioned change in the RHEL_RELEASE macro was, to the best of my knowledge, simply that they started putting quotes around the number. So for instance it went from something like this:

      #define RHEL_RELEASE 200

      to something like this:

      #define RHEL_RELEASE "300"

      It would have been a pretty simple matter to hand either quotes or no quotes. In fact I made such a change locally. I'm not sure why this required a complete change of the logic.

      But the new logic introduced a few problems, and I would argue that it should be reverted.

      First of all, the commit message subject claims that the new method is more accurate than the previous. This is obviously incorrect. The new method allows significantly less accuracy than the previous method.

      In the old method, we could target specific RHEL kernel versions, but in the new method we can only have a single series file per RHEL distro version. If there are kernel updates within a rhel update, there is no longer any way to specify different series files for each kernel. The previous method allowed that.

      Next, the new method absolutely guarantees that Lustre can never build for any new RHEL version that comes out without developer intervention. The old method supported kernel versions and more importantly, ranges of kernel versions. The last range was always X version AND NEWER. So the most recent series file could always be tried with newer kernels. If there was not too much change in the kernel, the series file can be applied cleanly and no developer intervention is needed.

      I would much prefer that we try to have the latest series applied to newer kernels, and fail during patch application time, then to fail to even try to apply any series files at all. (Currently it fails to even try).

      And it is also worth noting that the patch introduced this bug:

      +       6[0-3]) LDISKFS_SERIES="2.6-rhel6.series"       ;;
      

      This range does not work. Square brackets are used as quotes in autoconf's m4 system, so these square brackets disappear. The 2.6-rhel6.series will never be used as written.

      In summary, the logic change introduced these problems:

      1. Less kernel version specificity
      2. Guaranteed failure to build with every minor RHEL update
      3. Broken use of square brackets

      I would very much like to see this fixed before 2.9 comes out.

      Attachments

        Activity

          [LU-8533] Problems with ldiskfs patch series selection logic

          No. I want you to put it back the way it was.

          morrone Christopher Morrone (Inactive) added a comment - No. I want you to put it back the way it was.
          ys Yang Sheng added a comment -

          Hi, Chris,

          So as you comment. Could you submit a patch to change it as you like please?

          Thanks,
          YangSheng

          ys Yang Sheng added a comment - Hi, Chris, So as you comment. Could you submit a patch to change it as you like please? Thanks, YangSheng

          In fact, I don't think it is a good practice to keep multiple series file for single Distro.

          I don't see how that argument is relevant. The previous system did not force anyone to have extra patch series. It just allowed it when it was needed. The current system needlessly makes it more difficult.

          Anyway, Choice ldiskfs patch series depend on RHEL RELNO is more clear and agile. Also it is enough to handle all situations.

          I'm sorry, but that first claim (more clean) is very arguable, and the last two claims (more agile, able to handle all situations) are demonstrably false.

          Explain how the current system handles this situation:

          • Two kernels have RHEL_MAJOR and RHEL_MINOR that are the same. There are changes in ext4 between the two kernels, so the same ldiskfs patch series will not apply to both.

          It can't, right? At least not without adding additional dirty logic on the side. You can only choose one patch series per RHEL_MAJOR+RHEL_MINOR. But there are multiple kernels released with matching RHEL_MAJOR+RHEL_MINOR. RHEL_MAJOR+RHEL_MINOR is not a very unique way of identifying a kernel. While RHEL is unlikely to change significant interaces in the kernel within a single RHEL update, ldiskfs patch series can be derailed by even a simple contextual change in ext4. Those can, and do, happen.

          The old system could easily handle this situation. It was, therefore, the more agile of the two solutions.
          I can only guess that you don't see this as a problem because you think that Intel will just select one of the two kernels, and only maintain the series for that one. But you have to remember that Lustre does not own the kernel. There are people out there, your customers, that can't perfectly align our exact kernel version with the one that the Lustre team at Intel has decided to choose. We have our own schedules and requires that dictate when we change kernels and which kernel versions we use.

          Next, the current system introduces a guaranteed compilation failure every time the RHEL_MAJOR+RHEL_MINOR changes. While it is common to need a new patch series when that happens, it is not guaranteed. The new system has introduced a guaranteed failure into the process where previously there was only a likely failrue. It was possible with the old system that the most recent patch series could apply. And even if we accept that it is very likely that a new patch series will be needed with every new RHEL update, I would rather the build system fail during patch application, and tell me exactly where the first failure is rather than having refuse to even try to apply a patch series.

          I would argue that refusing to apply a patch series without knowing that it will fail is also a less agile approach.

          You have made it so that I can't add kernel support easily for kernels that Intel does not support without adding additional messy configure logic, rewriting the logic entirely, or overwriting the upstream patch series.

          None of those choices are desirable. I want to be able to easily add a new kernel version and patch series if an upstream one does not suite my needs.

          This is not theoretical. I have had to make ldiskfs patch series many, many times. The previous kernel selection logic was designed to meet all of our needs, not just yours.

          You have made things harder for me.

          I'm your customer.

          Please fix it.

          morrone Christopher Morrone (Inactive) added a comment - In fact, I don't think it is a good practice to keep multiple series file for single Distro. I don't see how that argument is relevant. The previous system did not force anyone to have extra patch series. It just allowed it when it was needed. The current system needlessly makes it more difficult. Anyway, Choice ldiskfs patch series depend on RHEL RELNO is more clear and agile. Also it is enough to handle all situations. I'm sorry, but that first claim (more clean) is very arguable, and the last two claims (more agile, able to handle all situations) are demonstrably false. Explain how the current system handles this situation: Two kernels have RHEL_MAJOR and RHEL_MINOR that are the same. There are changes in ext4 between the two kernels, so the same ldiskfs patch series will not apply to both. It can't, right? At least not without adding additional dirty logic on the side. You can only choose one patch series per RHEL_MAJOR+RHEL_MINOR. But there are multiple kernels released with matching RHEL_MAJOR+RHEL_MINOR. RHEL_MAJOR+RHEL_MINOR is not a very unique way of identifying a kernel. While RHEL is unlikely to change significant interaces in the kernel within a single RHEL update, ldiskfs patch series can be derailed by even a simple contextual change in ext4. Those can, and do, happen. The old system could easily handle this situation. It was, therefore, the more agile of the two solutions. I can only guess that you don't see this as a problem because you think that Intel will just select one of the two kernels, and only maintain the series for that one. But you have to remember that Lustre does not own the kernel. There are people out there, your customers , that can't perfectly align our exact kernel version with the one that the Lustre team at Intel has decided to choose. We have our own schedules and requires that dictate when we change kernels and which kernel versions we use. Next, the current system introduces a guaranteed compilation failure every time the RHEL_MAJOR+RHEL_MINOR changes. While it is common to need a new patch series when that happens, it is not guaranteed. The new system has introduced a guaranteed failure into the process where previously there was only a likely failrue. It was possible with the old system that the most recent patch series could apply. And even if we accept that it is very likely that a new patch series will be needed with every new RHEL update, I would rather the build system fail during patch application, and tell me exactly where the first failure is rather than having refuse to even try to apply a patch series. I would argue that refusing to apply a patch series without knowing that it will fail is also a less agile approach. You have made it so that I can't add kernel support easily for kernels that Intel does not support without adding additional messy configure logic, rewriting the logic entirely, or overwriting the upstream patch series. None of those choices are desirable. I want to be able to easily add a new kernel version and patch series if an upstream one does not suite my needs. This is not theoretical. I have had to make ldiskfs patch series many, many times. The previous kernel selection logic was designed to meet all of our needs, not just yours . You have made things harder for me . I'm your customer. Please fix it.
          ys Yang Sheng added a comment -

          I think fragile can hardly be a reason to keep different patch series in single RHEL Update. In fact, I don't think it is a good practice to keep multiple series file for single Distro.
          I said It is NOT harder to implement that apply latest patch series to new kernel in current. Also i think we should avoid to do so. Since it may cause some problem. This is what fragile i mean.
          Anyway, Choice ldiskfs patch series depend on RHEL RELNO is more clear and agile. Also it is enough to handle all situations.

          ys Yang Sheng added a comment - I think fragile can hardly be a reason to keep different patch series in single RHEL Update. In fact, I don't think it is a good practice to keep multiple series file for single Distro. I said It is NOT harder to implement that apply latest patch series to new kernel in current. Also i think we should avoid to do so. Since it may cause some problem. This is what fragile i mean. Anyway, Choice ldiskfs patch series depend on RHEL RELNO is more clear and agile. Also it is enough to handle all situations.

          As you say, ldiskfs is fragile. It is entirely possible that the a patch series will not apply to all of the kernels released during a single RHEL Update.

          Don't confuse capability with requirements. Just because we have the capability to support multiple kernels within a single RHEL update doesn't mean we have to have an explosion of patch series. We can still choose exactly when we choose. However should we ever choose to deal with that problem and have two patch series within an update, the new method prevents us from doing it.

          Of course, assign latest series to new kernel is not harder in current implement.

          Yes, it is harder in the current scheme. That capability was built into the old scheme, but in the current scheme we would have to hack additional logic on the side.

          This rewrite smells like the classic Not-Invented-Here-Syndrome mistake. The problem was handling quotes, so instead of making the simple change to handle quotes, the whole section was rewritten with less capabilities than what it replaced.

          morrone Christopher Morrone (Inactive) added a comment - As you say, ldiskfs is fragile. It is entirely possible that the a patch series will not apply to all of the kernels released during a single RHEL Update. Don't confuse capability with requirements. Just because we have the capability to support multiple kernels within a single RHEL update doesn't mean we have to have an explosion of patch series. We can still choose exactly when we choose. However should we ever choose to deal with that problem and have two patch series within an update, the new method prevents us from doing it. Of course, assign latest series to new kernel is not harder in current implement. Yes, it is harder in the current scheme. That capability was built into the old scheme, but in the current scheme we would have to hack additional logic on the side. This rewrite smells like the classic Not-Invented-Here-Syndrome mistake. The problem was handling quotes, so instead of making the simple change to handle quotes, the whole section was rewritten with less capabilities than what it replaced.
          ys Yang Sheng added a comment -

          I am totally understand your point. First i think trying to keep ability for special kernel version against ldiskfs series is not necessary. Just every RHEL Update is enough. ldiskfs patches were already numerous. Second, ldiskfs is very fragile. So i prefer handle it carefully rather than set a default value for a new kernel. It is risky even all patches applied. Of course, assign latest series to new kernel is not harder in current implement. I am very appreciate you point out 6[0-3] bug, It is really a good catch. I'll push a patch to fix it.

          ys Yang Sheng added a comment - I am totally understand your point. First i think trying to keep ability for special kernel version against ldiskfs series is not necessary. Just every RHEL Update is enough. ldiskfs patches were already numerous. Second, ldiskfs is very fragile. So i prefer handle it carefully rather than set a default value for a new kernel. It is risky even all patches applied. Of course, assign latest series to new kernel is not harder in current implement. I am very appreciate you point out 6 [0-3] bug, It is really a good catch. I'll push a patch to fix it.
          pjones Peter Jones added a comment -

          Yang Sheng

          Could you please advise on this issue?

          Thanks

          Peter

          pjones Peter Jones added a comment - Yang Sheng Could you please advise on this issue? Thanks Peter

          People

            ys Yang Sheng
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: