[LU-1665] hsb_head[0] in cfs_hash_bucket_t can experience poor alignment Created: 24/Jul/12 Updated: 23/Jan/13 Resolved: 23/Jan/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0, Lustre 2.1.2 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | chas williams - CONTRACTOR | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
ia64 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6360 |
| Description |
|
hsb_head[0] can experience poor alignment on certain platforms with odd alignment requirements like the ia64. it is possible for hsb_head[0] to be aligned to a 32-bit boundary, but later it could be cast to something with a stricter alignment requirement. this can/does cause hardware faults for unaligned access: Jul 18 19:25:20 seraph kernel: kernel unaligned access to 0xe0000069eef8f194, ip=0xa00000028342e0d1 |
| Comments |
| Comment by Peter Jones [ 24/Jul/12 ] |
|
Keith Can you please look into this one? Thanks Peter |
| Comment by Keith Mannthey (Inactive) [ 31/Jul/12 ] |
|
The char to long change is a low risk change overall but is isn't clear to me we are not just masking an underlaying issue. Do you know know what part of the code that accesses this data structure is causing the unaligned access? Where "ip=0xa00000028342e0d1" is? |
| Comment by chas williams - CONTRACTOR [ 01/Aug/12 ] |
|
for whatever reason, gcc doesnt quite align the structure optimally for ia64. it seems to aligning so that the data structure is compatible with x86/x86_64. try this program on x86_64: #include <stdio.h> struct long_then_charp { struct int_then_charp { unsigned int value; unsigned char string[0]; }; int main(int argc, char **argv) { struct long_then_charp *a; struct int_then_charp *b; unsigned long *anon; a = malloc(32); b = malloc(32); printf("&a.value %p\n", &a->value); printf("a.string[] %p\n", a->string); anon = (void *) a->string; *anon = 0xdeadbeefdeadbeef; printf("&b.value %p\n", &b->value); printf("b.string[] %p\n", b->string); anon = (void *) b->string; *anon = 0xdeadbeefdeadbeef; }when you run this you will see that a.string is aligned to an 8 byte boundary in one case and a 4 byte boundary in the other case. both of these will work on x86/x86_64 without complaint (although i imagine that the x64_64 might benefit from the long being aligned to an 8 byte boundary?) the kernel on ia64 issues a runtime message though for this program: % a.out so the b.string[] that was cast to the unsigned long and then accessed caused this fault. the program will run, but performance is somewhat reduced since the kernel has to handle this misalignment. in lustre, this specifically seems to be here: (gdb) info line *cfs_hash_bd_lookup_intent+0x80 that's here: { /* with this function, we can avoid a lot of useless refcount ops,
cfs_hash_bd_hhead() converts hsb_head[0] via the appropriate cfs_hash_hlist_ops_t to a cfs_hlist_head_t. apparently hhead winds up with a poor alignment for ia64 because of this. there are a couple other places, where this happens but it is basically the same. |
| Comment by Keith Mannthey (Inactive) [ 01/Aug/12 ] |
|
The reasoning seems good to me. Can you submit the patch in git? http://wiki.whamcloud.com/display/PUB/Submitting+Changes It makes the DCO and testing much easier. |
| Comment by Keith Mannthey (Inactive) [ 07/Aug/12 ] |
|
Do you have any questions about the process? |
| Comment by chas williams - CONTRACTOR [ 09/Aug/12 ] |
|
no. just busy this week. |
| Comment by Keith Mannthey (Inactive) [ 04/Jan/13 ] |
|
Without an update I will close this LU at the end of the Month. |
| Comment by chas williams - CONTRACTOR [ 07/Jan/13 ] |
|
i have a patch in gerrit http://review.whamcloud.com/#change,3603 i dont know if it was merged. build testing failed but i think the error was due to problems with the backend testing facility. |
| Comment by Peter Jones [ 07/Jan/13 ] |
|
Hi Chas Your change has not been merged yet. It has failed autotest due to an unrelated error. Usually the ticket owner will monitor contributed patches and take action if this happens but I imagine Keith was unaware of the patch. The convention is to post a comment with a link to any related patches so that those watching the JIRA ticket can find them. Keith Can you please see whether it is best to trigger a re-test on the patch with a no score build or else rebase the patch on the latest master code. Thanks Peter |
| Comment by Keith Mannthey (Inactive) [ 23/Jan/13 ] |
|
Sorry for that patch lost issue last year. The patch has just been landed to master. |