[LU-1423] 16K pagesize clients error during ls Created: 18/May/12  Updated: 23/Jul/12  Resolved: 23/Jul/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1, Lustre 1.8.6
Fix Version/s: Lustre 2.3.0, Lustre 2.1.3

Type: Bug Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None
Environment:

clients = ia64 (16Kpagesize) lustre-1.8.6.81-5.1nas
server = x86(4kpagesize) lustre-2.1.1


Severity: 2
Rank (Obsolete): 4558

 Description   

Mounting a lustre2.1.1 filesystem on a 1.8.6 client doing a ls will fail with the following error on the client logs.

LustreError: 2551436:0:(mdc_request.c:983:mdc_readpage()) Unexpected # bytes transferred: 4096 (16384 expected)
LustreError: 2556239:0:(mdc_request.c:983:mdc_readpage()) Unexpected # bytes transferred: 4096 (16384 expected)
LustreError: 2556239:0:(dir.c:949:ll_readdir_20()) error reading dir [0x12480001:0xc1bc1b0c:0x0] at 0: rc -71



 Comments   
Comment by Peter Jones [ 18/May/12 ]

YangSheng

Could you please look into this one?

Thanks

Peter

Comment by Mahmoud Hanafi [ 21/May/12 ]

Could I get an update on this.
Thanks,
Mahmoud

Comment by Jay Lan (Inactive) [ 22/May/12 ]

The kernel in the IA64 system is a sles10sp2 kernel 2.6.16.60-0.77.1-default.
The kernel config set CONFIG_IA64_PAGE_SIZE_16KB=y.

Comment by Jay Lan (Inactive) [ 22/May/12 ]

We also mounted a lustre 1.8.6 server on the IA64 client and it worked. So, it appears the problem was at the 2.1.1 server.

Comment by Andreas Dilger [ 29/May/12 ]

This likely relates to the changes made for 1MB readdir RPCs on 2.x, but the compatibility code on 1.8 was not updated correctly. Possibly the server needs to reply with the requested RPC size for old clients that do not handle OBD_CONNECT_BRW_SIZE, rather than hard-coding the 4096-byte lu_page size.

Comment by Jay Lan (Inactive) [ 29/May/12 ]

In the case that server and client have different page size, should they be settled with the higher value or lower value?

Comment by Yang Sheng [ 30/May/12 ]

Hi, Jay, Could you tell me what net type is used between client&server.

Comment by Jay Lan (Inactive) [ 30/May/12 ]

We use IB. ipoib on ib0 and rdma on ib1.

Comment by Yang Sheng [ 31/May/12 ]

This issue related to change: http://review.whamcloud.com/#change,604

For the fix, We need to check OBD_CONNECT_BRW_SIZE on server and return single page for old client. So looks like we need restore some codes that removed from above change.

Comment by Yang Sheng [ 03/Jun/12 ]

Patch unload to: http://review.whamcloud.com/3014

Comment by Mahmoud Hanafi [ 06/Jun/12 ]

We installed the patch. Got a LBUG!

Lustre: 3692:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGS->NET_0x500000a9719cf_UUID netid 50000: select flavor null^M
Lustre: 3692:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 37 previous similar messages^M
LustreError: 3761:0:(mdt_handler.c:1247:mdt_sendpage()) ASSERTION(desc->bd_nob == nob) failed^M
LustreError: 3761:0:(mdt_handler.c:1247:mdt_sendpage()) LBUG^M
Pid: 3761, comm: mdt_rdpg_01^M
^M
Call Trace:^M
[<ffffffffa05e3855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
[<ffffffffa05e3e95>] lbug_with_loc+0x75/0xe0 [libcfs]^M
^M
Entering kdb (current=0xffff880c1f4fa0c0, pid 3761) on processor 0 Oops: (null)^M

Comment by Yang Sheng [ 06/Jun/12 ]

Patch update, Please retry, sorry for the inconvenient.

Comment by Jay Lan (Inactive) [ 14/Jun/12 ]

The latest patch seemed working for us. Mahmoud tested it.

Comment by Yang Sheng [ 20/Jul/12 ]

Hi, Jay, Cloud we close this ticket? Please advice.

Comment by Jay Lan (Inactive) [ 20/Jul/12 ]

The patch was landed to b2_1 tree 3 days ago. So, please close this ticket.

Comment by Yang Sheng [ 23/Jul/12 ]

Patch landed. close bug.

Generated at Sat Feb 10 01:16:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.