[LU-96] llite_loop.ko does not support >= 64k pages Created: 23/Feb/11 Updated: 21/Mar/11 Resolved: 21/Mar/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Christopher Morrone | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL6, ppc64 |
||
| Severity: | 4 |
| Rank (Obsolete): | 9789 |
| Description |
|
llite_loop.ko will not currently build on systems with >= 64k pages. 64k is apparently now the default in RHEL6 on ppc64 systems, so this may be an issue. For LLNL, we really only need client support, so we will be fine in the short term if By longer term, this problem in lustre/llite/lloop.c: needs to be addressed: CLASSERT(CFS_PAGE_SIZE < (1 << (sizeof(unsigned short) * 8))); Lustre is setting the blk_queue_logical_block_size to the page size, but because it is an unsigned short (of 2 bytes) any number over 65535 will be truncated. |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 23/Feb/11 ] |
|
This problem was imported at: commit 5bae6efc7d2f8c117d96483cc11d1d428bd6abd4 b=22514 Update RHEL5.5 & OEL5.5 to latest kernel. --RHEL5 2.6.18-194.17.1.el5. Investigating it |
| Comment by Niu Yawei (Inactive) [ 24/Feb/11 ] |
|
Hi, Xiong Hi, Andreas |
| Comment by Niu Yawei (Inactive) [ 24/Feb/11 ] |
|
I have a silly question, why we have to set the hardsect_size to PAGE_SIZE, is there anything wrong to set it 4k on a 64k page system? |
| Comment by Andreas Dilger [ 26/Feb/11 ] |
|
I don't recall if there was a requirement to keep the hatdsect size equal to the PAGE_SIZE or not, but I suspect yes - possibly to avoid sub-page IO tracking. Please look at the commit logs and/or bugzilla to see if there are any such requirements. It would be trivial to patch the client kernel to make hardsect size an int instead of a short, but for general use we do not patch the client kernel anymore, so that would either need to be maintained at LLNL and/or submitted upstream to RedHat. |
| Comment by Niu Yawei (Inactive) [ 28/Feb/11 ] |
|
Hi, Andreas, Xiong I checked the bugzilla (b5498), and didn't find such requirement, and I didn't find any sub-page IO tracking problem in the code, I asked Xiong (he is the original producer of lloop) about this, he can't recall the exact reason neihter, so I speculate that there isn't any special purpose for this. Since the lloop is introduced for swap, we didn't get any throuble on this. However, if we want to use lloop device as back store for some local filesystem, we might meet trouble when formatting a fs with block size smaller than PAGE_SIZE. Xiong mentioned that clio can only support page size aligned direct io, which makes the lloop in 2.0 can only handle page size aligned IO. The detail is: clio direct io issues io by cl_page, but cl_page doesn't have offset/length (I guess it because cl_page was invented for buffered io), so it's hard to send partial page to OSS. I think that should be an implementation defect in clio direct io (and it has nothing to do whith the hardsect_size), but not sure if we plan to fix it. (Xiong, please correct me if I'm wrong) Given the assumption of no heavy change to the lloop, I think:
I think the first solution doesn't require maintaining kernel patches, it should be easier for customer? Any comments? |
| Comment by Peter Jones [ 07/Mar/11 ] |
|
Jay\Andreas - are you able to comment? thanks |
| Comment by Andreas Dilger [ 07/Mar/11 ] |
|
I am ok with simply disabling the loop device for the PPC client for now, or 64k PAGE_SIZE (whichever is easier). This code is only marginally tested, if at all, and I'd rather it just be disabled on that arch until we have a need to use it there. |
| Comment by Jinshan Xiong (Inactive) [ 07/Mar/11 ] |
|
Let's fix this after kernel has solved the `unsigned short' issue. There is someone else who have met similar problem and has a patch, hopefully we will see it in the mainline kernels. |
| Comment by Niu Yawei (Inactive) [ 07/Mar/11 ] |
|
Thanks, Andreas and Jay. Let's just disable it for now (I think Chris and Brian is working on this in |
| Comment by Peter Jones [ 14/Mar/11 ] |
|
ok, so if the approach is to avoid this issue and wait for an upstream fix, can this ticket be marked as resolved? |