Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.2.0
-
None
-
----------------------------------------------------------------------------------------------------
## MDS HW ##
----------------------------------------------------------------------------------------------------
Linux XXXX.admin.cscs.ch 2.6.32-220.7.1.el6_lustre.g9c8f747.x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
Vendor ID: AuthenticAMD
CPU family: 16
64Gb RAM
Interconnect IB 40Gb/s
---
MDT LSI 5480 Pikes Peak
SSDs SLC
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## OSS HW ##
----------------------------------------------------------------------------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
Vendor ID: GenuineIntel
CPU family: 6
64Gb RAM
Interconnect IB 40Gb/s
---
OSTs ---> LSI 7900 SATA Disks
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## Router nodes ##
----------------------------------------------------------------------------------------------------
12 Cray XE6 Service nodes as router nodes - IB 40Gb/s
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## Clients ##
----------------------------------------------------------------------------------------------------
~ 1500 Cray XE6 nodes - Lustre 1.8.6
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## LUSTRE Config ##
----------------------------------------------------------------------------------------------------
1 MDS + 1 fail over (MDT on SSD array)
12 OSSs - 6 OSTs per OSS (72 OSTs)
Luster Servers ---> 2.2.51.0
Lustre Clients ---> 1.8.6 (~1500 nodes) / 2.2.51.0 (~20 nodes)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ## MDS HW ## ---------------------------------------------------------------------------------------------------- Linux XXXX.admin.cscs.ch 2.6.32-220.7.1.el6_lustre.g9c8f747.x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 Vendor ID: AuthenticAMD CPU family: 16 64Gb RAM Interconnect IB 40Gb/s --- MDT LSI 5480 Pikes Peak SSDs SLC ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## OSS HW ## ---------------------------------------------------------------------------------------------------- Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 Vendor ID: GenuineIntel CPU family: 6 64Gb RAM Interconnect IB 40Gb/s --- OSTs ---> LSI 7900 SATA Disks ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## Router nodes ## ---------------------------------------------------------------------------------------------------- 12 Cray XE6 Service nodes as router nodes - IB 40Gb/s ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## Clients ## ---------------------------------------------------------------------------------------------------- ~ 1500 Cray XE6 nodes - Lustre 1.8.6 ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## LUSTRE Config ## ---------------------------------------------------------------------------------------------------- 1 MDS + 1 fail over (MDT on SSD array) 12 OSSs - 6 OSTs per OSS (72 OSTs) Luster Servers ---> 2.2.51.0 Lustre Clients ---> 1.8.6 (~1500 nodes) / 2.2.51.0 (~20 nodes) ----------------------------------------------------------------------------------------------------
-
1
-
4042
Description
Dear Support,
we experienced some problem with our Lustre FS.
Our users complained that at the following times their jobs were killed due IO errors.
Saturday 09 June 05:58
Monday 11 June 12:47
We collected the logs from the servers and clients side and actually we saw a lot of messages/errors that we have problem to "decode".
Could you please help us to undestand why this problem arise ?
In the specific we don't understand if it's really a overload problem related to the hardware or configuration we used otherwise some congestion/bug issue...
Usually the FS seems to work correctly but suddently the log fill up of these messages.
Regards
Nicola