[LU-1115] software raid6 related BUG in fs/bio.c:222 when raid chunk > 64k Created: 17/Feb/12 Updated: 22/Feb/13 Resolved: 22/Dec/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.7 |
| Fix Version/s: | Lustre 2.1.4, Lustre 1.8.9 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Robin Humble | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
x86_64, centos5/rhel5, server, software raid 8+2 raid6 with 128k chunks |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6451 |
| Description |
|
RedHat have changed drivers/md/raid5.c between kernels 2.6.18-238.12.1.el5 (1.8.6) and 2.6.18-274.3.1.el5 (1.8.7) (see attached diff) and I think those changes might be interacting with the Lustre md raid5/6 patches and causing the kernel to BUG. the 2.6.18-274.3.1.el5 + lustre 1.8.7 kernel works fine with a md raid6 8+2 setup with 64k raid chunks, but with 128k raid chunks it BUG's pretty much immediately when the first Lustre traffic starts. another site has seen the same problem with 256k raid chunks and the stock 1.8.7 server rpm. one data point is that if I revert RedHat's raid5.c back to the previous version (eg. from 2.6.18-238.12.1.el5 as used with lustre 1.8.6) then everything seems ok - 128k chunk works, and I'm told 256k does as well. I don't understand enough of the bio and raid5 logic to know why this helps, but maybe it's a hint.
a typical BUG looks like: 2012-02-13 16:55:10 ----------- [cut here ] --------- [please bite here ] --------- |
| Comments |
| Comment by Robin Humble [ 20/Feb/12 ] |
|
I've bisected the problem to these two patches: raid5-large-io-rhel5.patch if I apply all the standard rhel5 server patches except these two then md raid6 works. the second patch above is a refactoring of the first. if I apply just the first patch above then the kernel BUG's as before. I wrote the first patch, but it was a long time ago now. I can't remember where I got the idea/justification for it. I'll try to figure it out, but would appreciate any help. these patches allow 1M i/o's from lustre to get through to the raid code without being split up. write performance suffers considerably if they are omitted. depending on raid chunk size, some % of all software raid users will simply see a crashed kernel with stock 1.8.7-wc1 lustre whereas it all worked fine in 1.8.6-wc1, so I don't understand why more folks haven't reported this problem. perhaps they've just gone back to 1.8.6-wc1... |
| Comment by Robin Humble [ 01/Mar/12 ] |
|
after looking at this some more, I think RedHat just made a mistake. the diff that RedHat cherry picked from mainline for RHEL5.7 is basically this commit: and the very next commit is: ""block: make bi_phys_segments an unsigned int instead of short which reverts the behaviour so that bio->bi_phys_segments has a usable 4 bytes again. so I think RedHat's patch to raid5.c so IMHO it is safe to revert all or part of the RedHat patch in order to let bio->bi_phys_segments use all 4 bytes again. nothing in raid5.c uses the *_bi_hw_segments functions, or the high order bytes that are squirreled away in bi_phys_segments. md_raid5_fix_rhel5.7.patch is an attempt to revert part of RedHat's patch so that > 255 bio's are available again, or the whole thing can be reverted as per md_raid5_2.6.18-238.12.1.el5_to_2.6.18-274.3.1.el5.diff |
| Comment by Peter Jones [ 05/Apr/12 ] |
|
Yangsheng Could you please check whether this problem still exists in the latest kernel update? Thanks Peter |
| Comment by Yang Sheng [ 06/Apr/12 ] |
|
Looks like this issue still exist latest rhel5.8 kernel. As Robin point out, we may carry http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5b99c2ffa980528a197f26c7d876cceeccce8dd5 in our series as a solution. So we can just simple remove it while Redhat also included this change. |
| Comment by Yang Sheng [ 02/May/12 ] |
|
Patch commit to:http://review.whamcloud.com/#change,2625 |
| Comment by Yang Sheng [ 23/Aug/12 ] |
|
Patch landed, Close bug. |
| Comment by Emoly Liu [ 13/Nov/12 ] |
|
port for b2_1 is here http://review.whamcloud.com/#change,4526 |
| Comment by Emoly Liu [ 21/Nov/12 ] |
|
Port for b2_1 has been successfully cherry-picked as 96af312f068b642417cf1bba079822f4abb5723d. |