[LU-8172] match 1 length 172 too big: 160 left, 160 allowed Created: 19/May/16  Updated: 24/May/16  Resolved: 24/May/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Doug Oucharek (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

2.7.1 with LU-3322


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Trying to run simple lnet_selftest failed between 2 nodes with this error

00000400:00020000:13.0F:1463699033.377705:0:8764:0:(lib-ptl.c:190:lnet_try_match_md()) Matching packet from 12345-10.151.27.60@o2ib, match 1 length 172 too big: 160 left, 160 allowed
00000400:00000100:13.0:1463699033.420950:0:8764:0:(lib-move.c:1479:lnet_parse_put()) Dropping PUT from 12345-10.151.27.60@o2ib portal 51 match 1 offset 0 length 172: 4

It works within the same node.



 Comments   
Comment by Jay Lan (Inactive) [ 20/May/16 ]

I rebased our nas-2.7.1-fe git repo with b2_7_fe on 5/12.
LU-3322 was pulled in with that rebase. We did not have the problem before the rebase.

There are some LU patches we carry not in b2_7_fe:

LU-7808 lnet: Debug patch to investigate MD Unlink
LU-7733 ptlrpc: print times in microseconds
LU-7795 quota: tuneable soft least qunit
LU-6433 quota: handle QUOTA_DQACQ in READPAGE portal
LU-7569 o2iblnd: avoid intensive reconnecting
LU-7210 o2iblnd: take extra refcount in kiblnd_connreq_done
LU-7054 o2iblnd: less intense allocating retry
Revert "LU-7054 o2iblnd: less intense allocating retry"
LU-7372 mgs: reprocess all locks at device fini
LU-5704 utils: stop open hangs on fifo files
LU-7054 o2iblnd: less intense allocating retry
LU-7506 lfs: "lfs quota -h" should support petabytes output

The git repo can be accessed at github. Peter Jones's account has access to it.
https://github.com/NASAEarthExchange/lustre-nas-fe/commits/nas-2.7.1

Comment by Peter Jones [ 20/May/16 ]

Doug

Could you please advise?

Thanks

Peter

Comment by Doug Oucharek (Inactive) [ 20/May/16 ]

Is the exact same build running on all the nodes involved in the self test? My interpretation of the error is that the message size we are anticipating is 12 bytes less than the amount which was sent.

Comment by Mahmoud Hanafi [ 21/May/16 ]

Are there any issue with running 2.7.1+LU-3322 server and 2.5.3or2.7.1 none LU-3322 clients?

Comment by Mahmoud Hanafi [ 21/May/16 ]

There is incompatibility of lst between lu-3322 builds and non lu-3322. This was the issue I was having

Should lst work with a build without LU-3322 and one with LU-3322?

Comment by Jay Lan (Inactive) [ 21/May/16 ]

Mahmoud, our nas-2.7.1-4.1nasS server and nas-2.7.1-4nasC client both have LU-3322.
The nas-2.5.3 client does not have LU-3322.

Comment by Mahmoud Hanafi [ 21/May/16 ]

after trying out different version it looks like "LU-7808 lnet: Debug patch to investigate MD Unlink" is where it breaks.

Comment by Peter Jones [ 21/May/16 ]

Mahmoud

I recall that this patch was quite different between the two versions. Do you have the appropriate version on both releases? Does everything work fine if you remove it altogether?

Peter

Comment by Jay Lan (Inactive) [ 24/May/16 ]

Peter,

We have correct versions of the LU-7808 lnet debug patches.
The b2_5_fe(#19643, patch set 4) and b2_7_fe(#19916 patch set 1) are different.

Comment by Doug Oucharek (Inactive) [ 24/May/16 ]

The 3 new stats I added to the debug patch for LU-7808 add up to 12 bytes and are traded between the two nodes during an lnet-selftest. So it looks like one side is sending the new 3 stats but the other side is not expecting them. I checked the two patches and they both should be sending/receiving the new 3 stats.

As well as LNet, the debug patches for LU-7808 changed the lnet-selftest module. Has the updated lnet-selftest module been updated on both nodes?

Comment by Mahmoud Hanafi [ 24/May/16 ]

It does look like this was the main issue and lst didn't match across all sides.

You can close this issue.

Generated at Sat Feb 10 02:15:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.