[LU-14888] We are uncertain that we may hit the bug Created: 27/Jul/21  Updated: 29/Jul/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.3
Fix Version/s: None

Type: Question/Request Priority: Major
Reporter: Hong Kong University Assignee: Peter Jones
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Client : 2.12.2
Server : 2.12.3 , from HPE lustre


Rank (Obsolete): 9223372036854775807

 Description   

We are finding an issue that is frequently disconnecting OST from the client

We currently have OSS1-4 and MDS1-2 which works as Lustre server , and there is the version of 2.12.3 provided by HPE

Our client are using lustre client of 2.12.2


On OSS1, we noticed many disconnections and reconnections of Lustre clients from various OSTs as shown below.

In particular, bulk IO read error was reported for the client at 192.168.3.182 (NFS2).

 

 The NFS2 even got under very high loading on 15 Jun morning, and we rebooted it.

Since then it could no longer mount any Lustre file systems with error below in NFS2’s dmesg

 

 

Upon investigation, we obtained the MDS1-2 and OSS1-2 sosreport and one of the compute node that are reporting OST disconnection, which are included in the link below

https://drive.google.com/open?id=1_tR7DiXCjzXWEd5ctPjPA5NFn_FweFvq

 

-------------

We are suspecting we faced the bug of the below , LU-13719, and we want to make sure that is true.

https://jira.whamcloud.com/browse/LU-13719



 Comments   
Comment by Peter Jones [ 28/Jul/21 ]

To be clear - do you mean that this is for the HPE Clusterstor distribution or just the unpatched 2.12.3 that HPE passed on to you?

Comment by Hong Kong University [ 29/Jul/21 ]

Scalable_Storage_with_Lustre_2.12.3_for_Gen9_and_Gen10_systems_P9L65-10015.

Comment by Peter Jones [ 29/Jul/21 ]

I am not familiar with that at all so cannot offer authoritative advice but, generally speaking, getting a more current version from HPE could be advantageous.

Generated at Sat Feb 10 03:13:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.