[LU-11692] lustre kernel panic - (niobuf.c:330:ptlrpc_register_bulk()) LBUG Created: 22/Nov/18  Updated: 28/Feb/19  Resolved: 28/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Campbell Mcleay (Inactive) Assignee: Andreas Dilger
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-8573 IOR: niobuf.c:319:ptlrpc_register_bul... Resolved
duplicates LU-11647 niobuf.c:330:ptlrpc_register_bulk()) ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We get the following error on occasion on our Lustre gateways (Lustre clients exporting filesystems over NFS for non-Lustre clients):

 

cmcl@foxtrot3 ~ -bash$
Message from syslogd@foxtrot3 at Nov 15 20:20:25 ...
kernel:LustreError: 3501:0:(niobuf.c:330:ptlrpc_register_bulk()) ASSERTION( desc->bd_md_count == 0 ) failed:

Message from syslogd@foxtrot3 at Nov 15 20:20:25 ...
kernel:LustreError: 3501:0:(niobuf.c:330:ptlrpc_register_bulk()) LBUG

Message from syslogd@foxtrot3 at Nov 15 20:20:25 ...
kernel:Kernel panic - not syncing: LBUG
packet_write_wait: Connection to 10.21.22.32 port 22: Broken pipe

 



 Comments   
Comment by Peter Jones [ 22/Nov/18 ]

cmcl which version of Lustre are you running?

Comment by Campbell Mcleay (Inactive) [ 22/Nov/18 ]

Oops sorry, should have put that in:

2.10.2-1.el7

kernel is: kernel-3.10.0-693.5.2.el7_lustre

Comment by Andreas Dilger [ 22/Nov/18 ]

The patch in LU-11647 should fix this issue. It is still undergoing testing and has not yet landed for any release. The patch is https://review.whamcloud.com/33167 "LU-11647 ptlrpc: race with reply_in_callback".

Since this is a client-only patch you could potentially install it on one or more of the NFS-exporting clients to determine if that patch resolves the problem for you.

Comment by Campbell Mcleay (Inactive) [ 28/Feb/19 ]

Just some feedback: there has not been any crashes on clients since installing the patched software.

Comment by Peter Jones [ 28/Feb/19 ]

ok then I think that it is ok to close this ticket as a duplicate of LU-11647

Comment by Andreas Dilger [ 28/Feb/19 ]

Reopen to fix resolution type.

Generated at Sat Feb 10 02:46:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.