[LU-241] support crc32c with hardware accelerated instruction as one of lustre checksums Created: 26/Apr/11 Updated: 27/Jan/12 Resolved: 13/Oct/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.2.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Peter Jones |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Bugzilla ID: | 23,549 | ||||||||
| Rank (Obsolete): | 4894 | ||||||||
| Description |
|
The current lustre codes is an limitation that is single client's performance (buffered I/O) when the checksum is turned on. My understanding is that the buffered I/O is handled by ptlrpcd on the client and the lustre checksum is also calculated in this thread if the client read the data from OSSs. ptlrpcd is not multithreaded in current codes, the checksum calculation harms CPU resources and impacts the lustre performance. The other hand, the latest Intel Nehalem/Westmere CPUs have the hardware crc32c accelerated instruction function and it's implemented in the CPU chip. we can use fast crc32c instruction without any additional costs if server is running with Intel CPUs. The current lustre supports crc, alder as checksum algorithm. So, I would suggest adding crc32c as one of additional checksum algorithm and enable See bz#23549 on bugzilla. The initial patch is available and simple testing is done. The performance was much improved. single client's write performance : (max) 30% improved see more detail : https://bugzilla.lustre.org/attachment.cgi?id=31604 And I saw this patch can reduce the CPU usages too. However, see also bz#23771, we saw some some error "checksum protocl errors" sometimes. (not always, and we still don't know when (what's timing) this error shows up. Done someone have a look at codes in patch and give me some suggestions to fix or how figure out bug 23771? I wonder if we can land this patch into the lustre mainstream once we can fix 23771. |
| Comments |
| Comment by Peter Jones [ 26/Apr/11 ] |
|
Thanks for opening a ticket for this Ihara - we do not want to lose this useful work in progress |
| Comment by Shuichi Ihara (Inactive) [ 08/Jun/11 ] |
|
fixed patch against bug 23549. |
| Comment by Peter Jones [ 15/Jun/11 ] |
|
Ihara do you mean that you will be uploading a patch into gerrit? |
| Comment by Shuichi Ihara (Inactive) [ 06/Jul/11 ] |
|
I've done to submit patch set to master and b1_8 branch, and review in progress. for master branch for b1_8 branch |
| Comment by Shuichi Ihara (Inactive) [ 31/Aug/11 ] |
|
Maloo is always failing due to Node provisioning timed out... can someone have a look at this Maloo errors? https://maloo.whamcloud.com/test_sets/4763d8b4-d394-11e0-8d02-52540025f9af |
| Comment by Jian Yu [ 31/Aug/11 ] |
|
Hello Ihara,
From the console log of client node fat-intel-1vm1, we could see: ┌──────────────────────────┤ Missing Package ├───────────────────────────┐ You have specified that the package 'kernel-2.6.32-131.2.1.el6.x86_64' should be installed. This package does not exist. Would you like to continue or abort this installation? Abort │ Ignore All │ Continue For master branch, the kernel version for RHEL6 has been updated to '2.6.32-131.6.1.el6.x86_64'. For b1_8 branch, it's still '2.6.32-131.2.1.el6.x86_64'. Since the above node provisioning failure occurred while verifying the patches for master branch, could you please rebase your patches against the latest master branch and re-submit them to Gerrit to trigger a new build and avoid the above provision failure? For b1_8 branch, the issue still exists. I think Chris and Mike would find a way to fix that. |
| Comment by Shuichi Ihara (Inactive) [ 31/Aug/11 ] |
|
Hi Yu Jian, Can you kick rebuilding new RPMSs manually against re-submit patches? I can't start rebulding RPM manually, but you probabry do that. Thanks |
| Comment by Jian Yu [ 31/Aug/11 ] |
|
Hi Ihara,
After you re-submit the patches, the building would be triggered automatically, and after the new RPMs are built successfully, the autotest system would also be triggered automatically to verify the patches. |
| Comment by Shuichi Ihara (Inactive) [ 31/Aug/11 ] |
|
Yes, I know, but please see below adilger kicked rebuild and autotesitng without re-submit patches. http://build.whamcloud.com/job/lustre-reviews/1916/ I don't think we really need re-submit the patches to kick jenkins manually for new RPMSs builds. Once jenkins finishes to builld RPMs, Maloo will start autotesting.. that is my understanding. |
| Comment by Jian Yu [ 31/Aug/11 ] |
|
Hi Ihara, I meant you need rebase the patches against the latest master branch first and then re-submit them on Gerrit. From http://review.whamcloud.com/#change,1009 we could see the patch set 6 was based on commit a832ab57bda8658457193cc670b72a9995f10ff0 ( |
| Comment by Shuichi Ihara (Inactive) [ 31/Aug/11 ] |
|
Hi Yu Jian, Many thanks! Thanks |
| Comment by Jian Yu [ 01/Sep/11 ] |
|
You're welcome, Ihara! |
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Build Master (Inactive) [ 05/Oct/11 ] |
|
Integrated in Oleg Drokin : 0517160dd68ac026513ad1b8e3e6f7abd4acfdef
|
| Comment by Peter Jones [ 13/Oct/11 ] |
|
Landed for 2.2. Unlikely to consider landing this to 1.8.x at this point |
| Comment by Andreas Dilger [ 27/Jan/12 ] |
|
Are there any users of this feature in production? Since the CRC32C patch was landed after 2.1.0 was released, we are planning to land the |