[LU-16389] Lustre 2.12.9 ksocklnd crash with 100+GB ethernet Created: 12/Dec/22  Updated: 14/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.9
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Peter Jones
Resolution: Unresolved Votes: 0
Labels: ORNL
Environment:

RHEL8 running 2.12.9 which is nearly vanilla. This is a 200GiB ethernet setup.


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Using a nearly plain vanilla 2.12.9 Lustre version we see on our production 100GiB system the following crashes from time to time

kernel:LNetError: 6003:0:(socklnd_cb.c:1985:ksocknal_connect()) ASSERTION( (wanted & (1 << 3)) != 0 ) failed:
 kernel:LNetError: 6003:0:(socklnd_cb.c:1985:ksocknal_connect()) LBUG
kernel:Kernel panic - not syncing: LBUG in interrupt.



 Comments   
Comment by James A Simmons [ 03/Jan/23 ]

Any news?

Comment by Peter Jones [ 03/Jan/23 ]

James

This looks like a duplicate of LU-15137

Peter

Comment by James A Simmons [ 03/Jan/23 ]

We are running with those patches

Generated at Sat Feb 10 03:26:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.