Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7659

Replace KUC by more standard mechanisms

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • Upstream
    • Lustre 2.10.0
    • 9223372036854775807

    Description

      The Kernel Userland Communication (KUC) subsystem is a lustre-specific API for something relatively common (deliver stream of records from kernel to userland, transmit feedback from userland to kernel). We propose to replace it by character devices.

      Besides being more standard, it can also increase performance significantly. A process can read large chunks from the character device. Our proof of concept shows a 5~10x speedup for reading changelogs by blocks of 4k.

      I would like feedback and suggestions. The proposed implementation works as follows:

      • register a misc char device at mdc_setup (eg. /dev/changelog-lustre0-MDT0000). The minor number is associated to the corresponding OBD.
      • The .open handler starts a thread in the background, that iterates over the llog and enqueues up to X records into a ring buffer
      • The .read dequeues records from the ring buffer. We can make it blocking or not.
      • .release stops the background thread and releases resources
      • changelog clear is not yet implemented. It can either be a .write or a .unlocked_ioctl handler. Which would be preferable?

      The implementation for the copytool has not been done yet but would work in a similar way.

      Attachments

        Issue Links

          Activity

            [LU-7659] Replace KUC by more standard mechanisms

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34258
            Subject: LU-7659 hsm: Use netlink for KUC communication
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e3e1c2f7fa81dd149308b41c74ad190e32c858ae

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34258 Subject: LU-7659 hsm: Use netlink for KUC communication Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e3e1c2f7fa81dd149308b41c74ad190e32c858ae

            Yohan Pipereau (yohan.pipereau.ocre@cea.fr) uploaded a new patch: https://review.whamcloud.com/32941
            Subject: LU-7659 libcfs: Use netlink for KUC communication
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 31a839a7e846b7e53c0e846452435fffe83a0585

            gerrit Gerrit Updater added a comment - Yohan Pipereau (yohan.pipereau.ocre@cea.fr) uploaded a new patch: https://review.whamcloud.com/32941 Subject: LU-7659 libcfs: Use netlink for KUC communication Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 31a839a7e846b7e53c0e846452435fffe83a0585
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/18900/
            Subject: LU-7659 mdc: expose changelog through char devices
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1d40214d96dd6e36bd39a35f8419f753bae8d305

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/18900/ Subject: LU-7659 mdc: expose changelog through char devices Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1d40214d96dd6e36bd39a35f8419f753bae8d305

            Yes, we've discussed changing llogs over to use an index instead of a flat file. The benefit of the llog file is that it can be written mostly sequentially, and record cancellation only needs to update the bitmap in the header. The drawback is that updating the header is serialized, reserving space in the llog file is difficult if the record size is unknown, and there is added complexity the order of the log records does not match the order that transactions are completed.

            On a related note, did you look into connecting the LFSCK iterator to the new char interface to speed up the initial scanning for RBH?

            adilger Andreas Dilger added a comment - Yes, we've discussed changing llogs over to use an index instead of a flat file. The benefit of the llog file is that it can be written mostly sequentially, and record cancellation only needs to update the bitmap in the header. The drawback is that updating the header is serialized, reserving space in the llog file is difficult if the record size is unknown, and there is added complexity the order of the log records does not match the order that transactions are completed. On a related note, did you look into connecting the LFSCK iterator to the new char interface to speed up the initial scanning for RBH?

            Andreas, I realize that have not answered your questions, sorry for that, see below.

            would this new mechanism be able to handle multiple ChangeLog consumers?

            Yes, multiple processes can open the char device. By default they start reading from the beginning of the llog and they can lseek to wherever they want in the log to start at a given record. Very similar to the existing implementation in this sense.

            my preference would be to use read and write for the interface, instead of ioctl, since this can be used even from scripts

            Done.

            I would have suggested a /proc file instead of a char device, but new /proc files are frowned upon, and /sys files are only one value per file. The (minor) issue with a char device is the registration of the char major/minor, but it could use a misc char device?

            It is a misc char device.

            the .llseek() operation should allow seeking to a specific record, so that if there are multiple consumers and old records are not yet cancelled the new records can be found easily

            Done, using the record number as the offset to jump to.

            the char device should also have a .poll() method so that userspace can wait for new records efficiently instead of busy looping

            Done.

            One issue that had come up with ChangeLogs in the past was that they are single-threaded in the kernel, which limits performance during metadata operations. If we are changing the API in userspace, it might also be good to change the on-disk format to allow multiple ChangeLog files to be written in parallel. Probably not one per core (that may become too many on large MDS nodes), but maybe 4-8 or so. The records could be merge sorted in the kernel by the helper thread at read time.

            I'd love that. It is beyond the scope of this patch I'd' say, but I keep it in mind. Maybe indexes instead of llog catalogs?

            hdoreau Henri Doreau (Inactive) added a comment - Andreas, I realize that have not answered your questions, sorry for that, see below. would this new mechanism be able to handle multiple ChangeLog consumers? Yes, multiple processes can open the char device. By default they start reading from the beginning of the llog and they can lseek to wherever they want in the log to start at a given record. Very similar to the existing implementation in this sense. my preference would be to use read and write for the interface, instead of ioctl, since this can be used even from scripts Done. I would have suggested a /proc file instead of a char device, but new /proc files are frowned upon, and /sys files are only one value per file. The (minor) issue with a char device is the registration of the char major/minor, but it could use a misc char device? It is a misc char device. the .llseek() operation should allow seeking to a specific record, so that if there are multiple consumers and old records are not yet cancelled the new records can be found easily Done, using the record number as the offset to jump to. the char device should also have a .poll() method so that userspace can wait for new records efficiently instead of busy looping Done. One issue that had come up with ChangeLogs in the past was that they are single-threaded in the kernel, which limits performance during metadata operations. If we are changing the API in userspace, it might also be good to change the on-disk format to allow multiple ChangeLog files to be written in parallel. Probably not one per core (that may become too many on large MDS nodes), but maybe 4-8 or so. The records could be merge sorted in the kernel by the helper thread at read time. I'd love that. It is beyond the scope of this patch I'd' say, but I keep it in mind. Maybe indexes instead of llog catalogs?

            Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/20502
            Subject: LU-7659 mdc: revise copytool char device locking
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ebae9fe2b1fdf458638cabbde8a02fc1522ebb75

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/20502 Subject: LU-7659 mdc: revise copytool char device locking Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ebae9fe2b1fdf458638cabbde8a02fc1522ebb75

            Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/20501
            Subject: LU-7659 mdc: add an ioctl call to the copytool char device
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3d4f22eab430874560db611f9bd95fb31f63350f

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/20501 Subject: LU-7659 mdc: add an ioctl call to the copytool char device Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3d4f22eab430874560db611f9bd95fb31f63350f

            Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/20327
            Subject: LU-7659 mdc: expose hsm requests through char devices
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 52935cd7190a1b4d4b5def5a9244ce1e5ca60c3a

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/20327 Subject: LU-7659 mdc: expose hsm requests through char devices Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 52935cd7190a1b4d4b5def5a9244ce1e5ca60c3a

            As I explore netlink I wonder if the API could be used for this? In in my research I discovered it being used by the SCSI layer which surprised me.

            simmonsja James A Simmons added a comment - As I explore netlink I wonder if the API could be used for this? In in my research I discovered it being used by the SCSI layer which surprised me.

            People

              ypo Yohan Pipereau (Inactive)
              hdoreau Henri Doreau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated: