Details
Description
The Kernel Userland Communication (KUC) subsystem is a lustre-specific API for something relatively common (deliver stream of records from kernel to userland, transmit feedback from userland to kernel). We propose to replace it by character devices.
Besides being more standard, it can also increase performance significantly. A process can read large chunks from the character device. Our proof of concept shows a 5~10x speedup for reading changelogs by blocks of 4k.
I would like feedback and suggestions. The proposed implementation works as follows:
- register a misc char device at mdc_setup (eg. /dev/changelog-lustre0-MDT0000). The minor number is associated to the corresponding OBD.
- The .open handler starts a thread in the background, that iterates over the llog and enqueues up to X records into a ring buffer
- The .read dequeues records from the ring buffer. We can make it blocking or not.
- .release stops the background thread and releases resources
- changelog clear is not yet implemented. It can either be a .write or a .unlocked_ioctl handler. Which would be preferable?
The implementation for the copytool has not been done yet but would work in a similar way.
Attachments
Issue Links
- is related to
-
LU-12506 Client unable to mount filesystem with very large number of MDTs
-
- Resolved
-
-
LU-11626 mdc: obd might go away while referenced by code in mdc_changelog
-
- Resolved
-
-
LU-9680 Improve the user land to kernel space interface for lustre
-
- In Progress
-
-
LU-10141 Integer overflow in llapi_changelog_start
-
- Resolved
-
- is related to
-
LU-15373 changelog improvements tracking
-
- Open
-
-
LU-10968 add coordinator bypass upcalls for HSM archive and remove
-
- Reopened
-
Andreas, I realize that have not answered your questions, sorry for that, see below.
Yes, multiple processes can open the char device. By default they start reading from the beginning of the llog and they can lseek to wherever they want in the log to start at a given record. Very similar to the existing implementation in this sense.
Done.
It is a misc char device.
Done, using the record number as the offset to jump to.
Done.
I'd love that. It is beyond the scope of this patch I'd' say, but I keep it in mind. Maybe indexes instead of llog catalogs?