[LU-96] llite_loop.ko does not support >= 64k pages Created: 23/Feb/11  Updated: 21/Mar/11  Resolved: 21/Mar/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Christopher Morrone Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL6, ppc64


Severity: 4
Rank (Obsolete): 9789

 Description   

llite_loop.ko will not currently build on systems with >= 64k pages. 64k is apparently now the default in RHEL6 on ppc64 systems, so this may be an issue. For LLNL, we really only need client support, so we will be fine in the short term if LU-94 is integrated.

By longer term, this problem in lustre/llite/lloop.c: needs to be addressed:

CLASSERT(CFS_PAGE_SIZE < (1 << (sizeof(unsigned short) * 8)));
blk_queue_logical_block_size(lo->lo_queue,
(unsigned short)CFS_PAGE_SIZE);

Lustre is setting the blk_queue_logical_block_size to the page size, but because it is an unsigned short (of 2 bytes) any number over 65535 will be truncated.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 23/Feb/11 ]

This problem was imported at:

commit 5bae6efc7d2f8c117d96483cc11d1d428bd6abd4
Author: yangsheng <sheng.yang@oracle.com>
Date: Thu Oct 21 16:23:42 2010 +0800

b=22514 Update RHEL5.5 & OEL5.5 to latest kernel.

--RHEL5 2.6.18-194.17.1.el5.
--OEL5 2.6.18-194.17.1.0.1.el5.
--Switch using 'inkernel' OFED stack.
--Build fixes for ppc64 & ia64.

Investigating it

Comment by Niu Yawei (Inactive) [ 24/Feb/11 ]

Hi, Xiong
I don't see how this patch can fix the problem, since the 'hardsect_size'/'logical_block_size' is unsigned short in kernel.

Hi, Andreas
Do you have any thoughts/ideas about long term solution for this issue? I'll take it as a lower priority job. Thank you.

Comment by Niu Yawei (Inactive) [ 24/Feb/11 ]

I have a silly question, why we have to set the hardsect_size to PAGE_SIZE, is there anything wrong to set it 4k on a 64k page system?

Comment by Andreas Dilger [ 26/Feb/11 ]

I don't recall if there was a requirement to keep the hatdsect size equal to the PAGE_SIZE or not, but I suspect yes - possibly to avoid sub-page IO tracking. Please look at the commit logs and/or bugzilla to see if there are any such requirements.

It would be trivial to patch the client kernel to make hardsect size an int instead of a short, but for general use we do not patch the client kernel anymore, so that would either need to be maintained at LLNL and/or submitted upstream to RedHat.

Comment by Niu Yawei (Inactive) [ 28/Feb/11 ]

Hi, Andreas, Xiong

I checked the bugzilla (b5498), and didn't find such requirement, and I didn't find any sub-page IO tracking problem in the code, I asked Xiong (he is the original producer of lloop) about this, he can't recall the exact reason neihter, so I speculate that there isn't any special purpose for this. Since the lloop is introduced for swap, we didn't get any throuble on this. However, if we want to use lloop device as back store for some local filesystem, we might meet trouble when formatting a fs with block size smaller than PAGE_SIZE.

Xiong mentioned that clio can only support page size aligned direct io, which makes the lloop in 2.0 can only handle page size aligned IO. The detail is: clio direct io issues io by cl_page, but cl_page doesn't have offset/length (I guess it because cl_page was invented for buffered io), so it's hard to send partial page to OSS. I think that should be an implementation defect in clio direct io (and it has nothing to do whith the hardsect_size), but not sure if we plan to fix it. (Xiong, please correct me if I'm wrong)

Given the assumption of no heavy change to the lloop, I think:

  • if we only use lloop as swap device, then it's safe to always set the hardsect_size as 4k (no matter on 64k or 4k page system);
  • if we want to use lloop as back store for some other local fs, it's better to set hardsect_size as a smaller value, and for 2.0, the problem of clio direct io limitation needs be fixed.

I think the first solution doesn't require maintaining kernel patches, it should be easier for customer? Any comments?

Comment by Peter Jones [ 07/Mar/11 ]

Jay\Andreas - are you able to comment? thanks

Comment by Andreas Dilger [ 07/Mar/11 ]

I am ok with simply disabling the loop device for the PPC client for now, or 64k PAGE_SIZE (whichever is easier). This code is only marginally tested, if at all, and I'd rather it just be disabled on that arch until we have a need to use it there.

Comment by Jinshan Xiong (Inactive) [ 07/Mar/11 ]

Let's fix this after kernel has solved the `unsigned short' issue. There is someone else who have met similar problem and has a patch, hopefully we will see it in the mainline kernels.

Comment by Niu Yawei (Inactive) [ 07/Mar/11 ]

Thanks, Andreas and Jay. Let's just disable it for now (I think Chris and Brian is working on this in LU-94).

Comment by Peter Jones [ 14/Mar/11 ]

ok, so if the approach is to avoid this issue and wait for an upstream fix, can this ticket be marked as resolved?

Generated at Sat Feb 10 01:03:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.