[LU-10002] client_bulk_callback() event type 1, status -5, desc ffff881fe5a8f800 Created: 18/Sep/17  Updated: 21/Sep/17  Resolved: 21/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Olaf Faaland Assignee: Sonia Sharma (Inactive)
Resolution: Fixed Votes: 0
Labels: llnl
Environment:

ssh://review.whamcloud.com/fs/lustre-release-fe-llnl
Client is lustre-2.8.0_11.chaos
Server is lustre-2.5.5-13chaos
both x86_64


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Repeated console log entries like this on a client while running mdtest:

Lustre: lcy-OST000c-osc-ffff883ff7d0e800: Connection restored to 10.1.1.183@o2ib9 (at 10.1.1.183@o2ib9)
Lustre: Skipped 17 previous similar messages
mlx5_0:dump_cqe:262:(pid 5096): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 08007806 25000230 001344d2
LustreError: 5099:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff881fe5a8b800
LustreError: 5098:0:(events.c:203:client_bulk_callback()) event type 1, status -5, desc ffff881fd8390800
mlx5_0:dump_cqe:262:(pid 5097): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 08007806 25000231 00044ad2

Along with reconnects to the servers.

This issue occurs when running with the patches backported under LU-9932



 Comments   
Comment by Olaf Faaland [ 18/Sep/17 ]

I see the dump_cqe message is comgin from the infiniband driver

Comment by Peter Jones [ 19/Sep/17 ]

Sonia

Can you please advise

Thanks

Peter

Comment by Sonia Sharma (Inactive) [ 19/Sep/17 ]

Hi Olaf,
Do you have nodes with mlx5 cards talking to nodes with mlx4 cards in your setup?
The dump cqe error is seen on the nodes with mlx5 cards. You would need LU-8752 patch on the nodes with mlx5 cards. Can you check if LU-8752 is included on these nodes?

Thanks

Comment by Olaf Faaland [ 19/Sep/17 ]

Hi Sonia,
Yes we have nodes with mlx5 cards connecting to nodes with mlx4 cards, and no we do not have LU-8752. We will get that patch added.
Thanks

Comment by Olaf Faaland [ 19/Sep/17 ]

That appears to have solved the issue. Thanks.

Comment by Peter Jones [ 21/Sep/17 ]

Seems that this issue can be closed

Generated at Sat Feb 10 02:31:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.