Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
23,549
-
4894
Description
The current lustre codes is an limitation that is single client's performance (buffered I/O) when the checksum is turned on. My understanding is that the buffered I/O is handled by ptlrpcd on the client and the lustre checksum is also calculated in this thread if the client read the data from OSSs. ptlrpcd is not multithreaded in current codes, the checksum calculation harms CPU resources and impacts the lustre performance.
The other hand, the latest Intel Nehalem/Westmere CPUs have the hardware crc32c accelerated instruction function and it's implemented in the CPU chip. we can use fast crc32c instruction without any additional costs if server is running with Intel CPUs.
The current lustre supports crc, alder as checksum algorithm. So, I would suggest adding crc32c as one of additional checksum algorithm and enable
See bz#23549 on bugzilla. The initial patch is available and simple testing is done.
The performance was much improved.
e.g)
single client's read performance : 60% improved
1GB/sec (adler) vs 1.6GB/sec (crc32c/w hardware instruction)
single client's write performance : (max) 30% improved
1.3GB/sec (adler) vs 1.7GB/sec (crc32c/w hardware instruction)
see more detail : https://bugzilla.lustre.org/attachment.cgi?id=31604
And I saw this patch can reduce the CPU usages too.
However, see also bz#23771, we saw some some error "checksum protocl errors" sometimes. (not always, and we still don't know when (what's timing) this error shows up.
Done someone have a look at codes in patch and give me some suggestions to fix or how figure out bug 23771?
I wonder if we can land this patch into the lustre mainstream once we can fix 23771.
Attachments
Issue Links
- is related to
-
LU-1025 Lustre crc32c implementation not use final bit inversion
- Resolved
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA
-
Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....