[LU-8533] Problems with ldiskfs patch series selection logic Created: 24/Aug/16 Updated: 06/Oct/16 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Christopher Morrone | Assignee: | Yang Sheng |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | llnl | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The mentioned change in the RHEL_RELEASE macro was, to the best of my knowledge, simply that they started putting quotes around the number. So for instance it went from something like this: #define RHEL_RELEASE 200 to something like this: #define RHEL_RELEASE "300" It would have been a pretty simple matter to hand either quotes or no quotes. In fact I made such a change locally. I'm not sure why this required a complete change of the logic. But the new logic introduced a few problems, and I would argue that it should be reverted. First of all, the commit message subject claims that the new method is more accurate than the previous. This is obviously incorrect. The new method allows significantly less accuracy than the previous method. In the old method, we could target specific RHEL kernel versions, but in the new method we can only have a single series file per RHEL distro version. If there are kernel updates within a rhel update, there is no longer any way to specify different series files for each kernel. The previous method allowed that. Next, the new method absolutely guarantees that Lustre can never build for any new RHEL version that comes out without developer intervention. The old method supported kernel versions and more importantly, ranges of kernel versions. The last range was always X version AND NEWER. So the most recent series file could always be tried with newer kernels. If there was not too much change in the kernel, the series file can be applied cleanly and no developer intervention is needed. I would much prefer that we try to have the latest series applied to newer kernels, and fail during patch application time, then to fail to even try to apply any series files at all. (Currently it fails to even try). And it is also worth noting that the patch introduced this bug: + 6[0-3]) LDISKFS_SERIES="2.6-rhel6.series" ;; This range does not work. Square brackets are used as quotes in autoconf's m4 system, so these square brackets disappear. The 2.6-rhel6.series will never be used as written. In summary, the logic change introduced these problems:
I would very much like to see this fixed before 2.9 comes out. |
| Comments |
| Comment by Peter Jones [ 24/Aug/16 ] |
|
Yang Sheng Could you please advise on this issue? Thanks Peter |
| Comment by Yang Sheng [ 25/Aug/16 ] |
|
I am totally understand your point. First i think trying to keep ability for special kernel version against ldiskfs series is not necessary. Just every RHEL Update is enough. ldiskfs patches were already numerous. Second, ldiskfs is very fragile. So i prefer handle it carefully rather than set a default value for a new kernel. It is risky even all patches applied. Of course, assign latest series to new kernel is not harder in current implement. I am very appreciate you point out 6[0-3] bug, It is really a good catch. I'll push a patch to fix it. |
| Comment by Christopher Morrone [ 25/Aug/16 ] |
|
As you say, ldiskfs is fragile. It is entirely possible that the a patch series will not apply to all of the kernels released during a single RHEL Update. Don't confuse capability with requirements. Just because we have the capability to support multiple kernels within a single RHEL update doesn't mean we have to have an explosion of patch series. We can still choose exactly when we choose. However should we ever choose to deal with that problem and have two patch series within an update, the new method prevents us from doing it.
Yes, it is harder in the current scheme. That capability was built into the old scheme, but in the current scheme we would have to hack additional logic on the side. This rewrite smells like the classic Not-Invented-Here-Syndrome mistake. The problem was handling quotes, so instead of making the simple change to handle quotes, the whole section was rewritten with less capabilities than what it replaced. |
| Comment by Yang Sheng [ 26/Aug/16 ] |
|
I think fragile can hardly be a reason to keep different patch series in single RHEL Update. In fact, I don't think it is a good practice to keep multiple series file for single Distro. |
| Comment by Christopher Morrone [ 26/Aug/16 ] |
I don't see how that argument is relevant. The previous system did not force anyone to have extra patch series. It just allowed it when it was needed. The current system needlessly makes it more difficult.
I'm sorry, but that first claim (more clean) is very arguable, and the last two claims (more agile, able to handle all situations) are demonstrably false. Explain how the current system handles this situation:
It can't, right? At least not without adding additional dirty logic on the side. You can only choose one patch series per RHEL_MAJOR+RHEL_MINOR. But there are multiple kernels released with matching RHEL_MAJOR+RHEL_MINOR. RHEL_MAJOR+RHEL_MINOR is not a very unique way of identifying a kernel. While RHEL is unlikely to change significant interaces in the kernel within a single RHEL update, ldiskfs patch series can be derailed by even a simple contextual change in ext4. Those can, and do, happen. The old system could easily handle this situation. It was, therefore, the more agile of the two solutions. Next, the current system introduces a guaranteed compilation failure every time the RHEL_MAJOR+RHEL_MINOR changes. While it is common to need a new patch series when that happens, it is not guaranteed. The new system has introduced a guaranteed failure into the process where previously there was only a likely failrue. It was possible with the old system that the most recent patch series could apply. And even if we accept that it is very likely that a new patch series will be needed with every new RHEL update, I would rather the build system fail during patch application, and tell me exactly where the first failure is rather than having refuse to even try to apply a patch series. I would argue that refusing to apply a patch series without knowing that it will fail is also a less agile approach. You have made it so that I can't add kernel support easily for kernels that Intel does not support without adding additional messy configure logic, rewriting the logic entirely, or overwriting the upstream patch series. None of those choices are desirable. I want to be able to easily add a new kernel version and patch series if an upstream one does not suite my needs. This is not theoretical. I have had to make ldiskfs patch series many, many times. The previous kernel selection logic was designed to meet all of our needs, not just yours. You have made things harder for me. I'm your customer. Please fix it. |
| Comment by Yang Sheng [ 22/Sep/16 ] |
|
Hi, Chris, So as you comment. Could you submit a patch to change it as you like please? Thanks, |
| Comment by Christopher Morrone [ 06/Oct/16 ] |
|
No. I want you to put it back the way it was. |