[LU-14888] We are uncertain that we may hit the bug Created: 27/Jul/21 Updated: 29/Jul/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.3 |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Major |
| Reporter: | Hong Kong University | Assignee: | Peter Jones |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client : 2.12.2 |
||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We are finding an issue that is frequently disconnecting OST from the client We currently have OSS1-4 and MDS1-2 which works as Lustre server , and there is the version of 2.12.3 provided by HPE Our client are using lustre client of 2.12.2 On OSS1, we noticed many disconnections and reconnections of Lustre clients from various OSTs as shown below. In particular, bulk IO read error was reported for the client at 192.168.3.182 (NFS2).
The NFS2 even got under very high loading on 15 Jun morning, and we rebooted it. Since then it could no longer mount any Lustre file systems with error below in NFS2’s dmesg
Upon investigation, we obtained the MDS1-2 and OSS1-2 sosreport and one of the compute node that are reporting OST disconnection, which are included in the link below https://drive.google.com/open?id=1_tR7DiXCjzXWEd5ctPjPA5NFn_FweFvq
------------- We are suspecting we faced the bug of the below , https://jira.whamcloud.com/browse/LU-13719 |
| Comments |
| Comment by Peter Jones [ 28/Jul/21 ] |
|
To be clear - do you mean that this is for the HPE Clusterstor distribution or just the unpatched 2.12.3 that HPE passed on to you? |
| Comment by Hong Kong University [ 29/Jul/21 ] |
|
Scalable_Storage_with_Lustre_2.12.3_for_Gen9_and_Gen10_systems_P9L65-10015. |
| Comment by Peter Jones [ 29/Jul/21 ] |
|
I am not familiar with that at all so cannot offer authoritative advice but, generally speaking, getting a more current version from HPE could be advantageous. |