Details

    • Improvement
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.4.0
    • 21,524
    • 4869

    Description

      Remove Lustre kernel patches to allow Lustre servers to be more easily ported to new kernels, and to be built against vendor kernels without changing the vendor kernel RPMs. There are a number of different patches, each one needs to use equivalent functionality which already exists in the kernel, or work to get the patch accepted upstream.

      Corresponding to bugzilla link:
      https://bugzilla.lustre.org/show_bug.cgi?id=21524

      Attachments

        1. fio_sdck_block_size_read.png
          fio_sdck_block_size_read.png
          41 kB
        2. fio_sdck_block_size_write.png
          fio_sdck_block_size_write.png
          41 kB
        3. fio_sdck_io_depth_read.png
          fio_sdck_io_depth_read.png
          36 kB
        4. fio_sdck_io_depth_write.png
          fio_sdck_io_depth_write.png
          39 kB
        5. mdtest_create_8thr.png
          mdtest_create_8thr.png
          62 kB
        6. mdtest_remove_8thr.png
          mdtest_remove_8thr.png
          72 kB
        7. mdtest_stat_8thr.png
          mdtest_stat_8thr.png
          77 kB
        8. sgpdd_16devs_rsz_read.png
          sgpdd_16devs_rsz_read.png
          47 kB
        9. sgpdd_16devs_rsz_write.png
          sgpdd_16devs_rsz_write.png
          46 kB

        Issue Links

          Activity

            [LU-20] patchless server kernel

            Until that patch (http://review.whamcloud.com/23050 from LU-8685) is included into RHEL7, we will continue to patch the RHEL kernel shipped with Lustre to fix that bug.

            I don't understand that logic at all. A patched kernel could have been built completely external to the Lustre tree and allowed us to continue forward with completing this ticket. I really don't understand why this was deemed as a blocker, or had to happen sequentially.

            Maybe I'll restate what I think the goal of this ticket really is: Elminate the need for a "Lustre kernel" by eliminating all of the Lustre specific kernel patches (ignoring ldiskfs).

            The jbd2 fix, while effecting lustre, is not necessarily lustre specific. Therefore it does not need to be living in lustre/kernel_patches, and we don't need infrastructure in Lustre's main build system to pause in the middle of building lustre to go patch, build, and package a kernel. Instead patching, build, and packaging the kernel can be a completely external process that takes place before, and independantly of each lustre build.

            Thats the goal. Nothing that I can see really stands in the way of that goal, unless I'm missing something (and I did read LU-8685).

            morrone Christopher Morrone (Inactive) added a comment - Until that patch ( http://review.whamcloud.com/23050 from LU-8685 ) is included into RHEL7, we will continue to patch the RHEL kernel shipped with Lustre to fix that bug. I don't understand that logic at all. A patched kernel could have been built completely external to the Lustre tree and allowed us to continue forward with completing this ticket. I really don't understand why this was deemed as a blocker, or had to happen sequentially. Maybe I'll restate what I think the goal of this ticket really is: Elminate the need for a "Lustre kernel" by eliminating all of the Lustre specific kernel patches (ignoring ldiskfs). The jbd2 fix, while effecting lustre, is not necessarily lustre specific. Therefore it does not need to be living in lustre/kernel_patches, and we don't need infrastructure in Lustre's main build system to pause in the middle of building lustre to go patch, build, and package a kernel. Instead patching, build, and packaging the kernel can be a completely external process that takes place before, and independantly of each lustre build. Thats the goal. Nothing that I can see really stands in the way of that goal, unless I'm missing something (and I did read LU-8685 ).

            Great news!!!!! A new RHEL7.3 kernel has been released and it have the jbd2 fix. Time to move to kernel-3.10.0-514.21.1.el7. Patchless servers are again within are grasp.

            simmonsja James A Simmons added a comment - Great news!!!!! A new RHEL7.3 kernel has been released and it have the jbd2 fix. Time to move to kernel-3.10.0-514.21.1.el7. Patchless servers are again within are grasp.

            Per Peter's previous comment:

            we need a version of RHEL 7.x which include the fix for the upstream bug "jbd2: incorrect unlock on j_list_lock "

            Until that patch (http://review.whamcloud.com/23050 from LU-8685) is included into RHEL7, we will continue to patch the RHEL kernel shipped with Lustre to fix that bug.

            Of course, it is possible for anyone to use an unpatched kernel today with ZFS, or to build and run Lustre with a RHEL6 kernel, and this has been true for a couple of releases at least. The presence of kernel patches in the Lustre tree doesn't prevent that. While Intel will continue to apply the kernel patches until such a time that LU-684 and LU-8685 are fixed, it doesn't prevent others from building their Lustre RPMs differently.

            adilger Andreas Dilger added a comment - Per Peter's previous comment: we need a version of RHEL 7.x which include the fix for the upstream bug "jbd2: incorrect unlock on j_list_lock " Until that patch ( http://review.whamcloud.com/23050 from LU-8685 ) is included into RHEL7, we will continue to patch the RHEL kernel shipped with Lustre to fix that bug. Of course, it is possible for anyone to use an unpatched kernel today with ZFS, or to build and run Lustre with a RHEL6 kernel, and this has been true for a couple of releases at least. The presence of kernel patches in the Lustre tree doesn't prevent that. While Intel will continue to apply the kernel patches until such a time that LU-684 and LU-8685 are fixed, it doesn't prevent others from building their Lustre RPMs differently.

            I was with you until the last sentence. What relies on Red Hat's schedule, and why?

            morrone Christopher Morrone (Inactive) added a comment - I was with you until the last sentence. What relies on Red Hat's schedule, and why?
            pjones Peter Jones added a comment - - edited

            Yes of course finding a resolution for LU-684 is our ultimate and work on that continues. My strong preference would be for that to have been resolved ahead of the code freeze but, as that is by no means looking certain at present I was looking at whether we could adopt a contingency of having two build options - one as today for use in tests and the other being patchless and, while usable in production, not usable for all tests. This is a suggestion that has been made by several community members who are anxious to take advantage of patchless servers. Unfortunately this relies on Red Hat's schedule to be practical.

            pjones Peter Jones added a comment - - edited Yes of course finding a resolution for LU-684 is our ultimate and work on that continues. My strong preference would be for that to have been resolved ahead of the code freeze but, as that is by no means looking certain at present I was looking at whether we could adopt a contingency of having two build options - one as today for use in tests and the other being patchless and, while usable in production, not usable for all tests. This is a suggestion that has been made by several community members who are anxious to take advantage of patchless servers. Unfortunately this relies on Red Hat's schedule to be practical.

            I'm backing Chris here. LU-684 is definitely THE ticket that should be closed to move forward patchless server!

            adegremont Aurelien Degremont (Inactive) added a comment - I'm backing Chris here. LU-684 is definitely THE ticket that should be closed to move forward patchless server!

            Peter, I don't think that is how we should look at this ticket. While it is true that the kernel has a bug there that is a problem for lustre, lustre can build and run without that patch. We could, and should, move that patch out of the Lustre tree. That patch is a kernel patch, and should be housed in a kernel repository.

            What is really holding up this ticket is still subtask 2, LU-684. The dev_rdonly patch is, as I understand, an entirely Lustre-specific patch for the kernel. It will never be upstreamed, and will always be a burden on the Lustre developers to maintain.

            Once we finally finish LU-684, it will be possible to reasonable delete the "lustre/kernel_patches" directory from the Lustre repository, and make a much cleaner separation between building the kernel and building Lustre.

            So LU-684 remains the real blocker to calling this ticket complete. But even when it is done, there should still be some minor work to remove lustre/kernel_patches from the tree before this ticket is closed.

            morrone Christopher Morrone (Inactive) added a comment - - edited Peter, I don't think that is how we should look at this ticket. While it is true that the kernel has a bug there that is a problem for lustre, lustre can build and run without that patch. We could, and should, move that patch out of the Lustre tree. That patch is a kernel patch, and should be housed in a kernel repository. What is really holding up this ticket is still subtask 2, LU-684 . The dev_rdonly patch is, as I understand, an entirely Lustre-specific patch for the kernel. It will never be upstreamed, and will always be a burden on the Lustre developers to maintain. Once we finally finish LU-684 , it will be possible to reasonable delete the "lustre/kernel_patches" directory from the Lustre repository, and make a much cleaner separation between building the kernel and building Lustre. So LU-684 remains the real blocker to calling this ticket complete. But even when it is done, there should still be some minor work to remove lustre/kernel_patches from the tree before this ticket is closed.
            pjones Peter Jones added a comment -

            In order to go to patchless servers for ldiskfs deployments we need a version of RHEL 7.x which include the fix for the upstream bug "jbd2: incorrect unlock on j_list_lock ". We have been told that this will be available in the near future but at this stage it seems more likely to be available in the 2.10.1 timeframe rather than 2.10.0

            pjones Peter Jones added a comment - In order to go to patchless servers for ldiskfs deployments we need a version of RHEL 7.x which include the fix for the upstream bug "jbd2: incorrect unlock on j_list_lock ". We have been told that this will be available in the near future but at this stage it seems more likely to be available in the 2.10.1 timeframe rather than 2.10.0

            Oh, sorry, 26220 hasn't landed yet.  I'll just -1 it.

            morrone Christopher Morrone (Inactive) added a comment - Oh, sorry, 26220 hasn't landed yet.  I'll just -1 it.

            Sigh.  That approach is almost entirely impractical when it comes time to do the packaging.  Not only that, it would appear to break the install of the kmp-lustre-tests for builds with ldiskfs but without a patched kernel.  Which is exactly the situation we're trying keep support for.

            Looks like someone should open another 2.10 blocker.

             

            morrone Christopher Morrone (Inactive) added a comment - Sigh.  That approach is almost entirely impractical when it comes time to do the packaging.  Not only that, it would appear to break the install of the kmp-lustre-tests for builds with ldiskfs but without a patched kernel.  Which is exactly the situation we're trying keep support for. Looks like someone should open another 2.10 blocker.  

            Chris I have a question. With the patch https://review.whamcloud.com/26220 we can build osd-ldiskfs modules that work with both patched and unpatched kernels without a rebuild. Now if I install osd-ldiskfs modules on a unpatched kernel I do see no dev_read_only missing errors. So the question is how could add a install script in the spec file that can detect a patched kernel and install special osd-ldiskfs module?

            simmonsja James A Simmons added a comment - Chris I have a question. With the patch https://review.whamcloud.com/26220 we can build osd-ldiskfs modules that work with both patched and unpatched kernels without a rebuild. Now if I install osd-ldiskfs modules on a unpatched kernel I do see no dev_read_only missing errors. So the question is how could add a install script in the spec file that can detect a patched kernel and install special osd-ldiskfs module?

            People

              green Oleg Drokin
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: