Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.2.0
-
None
-
## MDS HW ##
----------------------------------------------------------------------------------------------------
Linux XXXX.admin.cscs.ch 2.6.32-220.7.1.el6_lustre.g9c8f747.x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
Vendor ID: AuthenticAMD
CPU family: 16
64Gb RAM
Interconnect IB 40Gb/s
---
MDT LSI 5480 Pikes Peak
SSDs SLC
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## OSS HW ##
----------------------------------------------------------------------------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
Vendor ID: GenuineIntel
CPU family: 6
64Gb RAM
Interconnect IB 40Gb/s
---
OSTs ---> LSI 7900 SATA Disks
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## Router nodes ##
----------------------------------------------------------------------------------------------------
12 Cray XE6 Service nodes as router nodes - IB 40Gb/s
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## Clients ##
----------------------------------------------------------------------------------------------------
~ 1500 Cray XE6 nodes - Lustre 1.8.6
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
## LUSTRE Config ##
----------------------------------------------------------------------------------------------------
1 MDS + 1 fail over (MDT on SSD array)
12 OSSs - 6 OSTs per OSS (72 OSTs)
Luster Servers ---> 2.2.51.0
Lustre Clients ---> 1.8.6 (~1500 nodes) / 2.2.51.0 (~20 nodes)
----------------------------------------------------------------------------------------------------## MDS HW ## ---------------------------------------------------------------------------------------------------- Linux XXXX.admin.cscs.ch 2.6.32-220.7.1.el6_lustre.g9c8f747.x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 Vendor ID: AuthenticAMD CPU family: 16 64Gb RAM Interconnect IB 40Gb/s --- MDT LSI 5480 Pikes Peak SSDs SLC ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## OSS HW ## ---------------------------------------------------------------------------------------------------- Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 Vendor ID: GenuineIntel CPU family: 6 64Gb RAM Interconnect IB 40Gb/s --- OSTs ---> LSI 7900 SATA Disks ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## Router nodes ## ---------------------------------------------------------------------------------------------------- 12 Cray XE6 Service nodes as router nodes - IB 40Gb/s ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## Clients ## ---------------------------------------------------------------------------------------------------- ~ 1500 Cray XE6 nodes - Lustre 1.8.6 ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- ## LUSTRE Config ## ---------------------------------------------------------------------------------------------------- 1 MDS + 1 fail over (MDT on SSD array) 12 OSSs - 6 OSTs per OSS (72 OSTs) Luster Servers ---> 2.2.51.0 Lustre Clients ---> 1.8.6 (~1500 nodes) / 2.2.51.0 (~20 nodes) ----------------------------------------------------------------------------------------------------
-
3
-
4502
Description
Lustre hung this morning, we are running a e2fsck on the MDT at the moment.
Mounting the mdt with ldiskfs we saw many large file like 'oi.XX.XX', what are these files?
Can you please help us debugging the current problem?
[root@weisshorn01 mdt]# ls
capa_keys OBJECTS oi.16.15 oi.16.22 oi.16.3 oi.16.37 oi.16.44 oi.16.51 oi.16.59 oi.16.9
CATALOGS oi.16.0 oi.16.16 oi.16.23 oi.16.30 oi.16.38 oi.16.45 oi.16.52 oi.16.6 PENDING
CONFIGS oi.16.1 oi.16.17 oi.16.24 oi.16.31 oi.16.39 oi.16.46 oi.16.53 oi.16.60 ROOT
fld oi.16.10 oi.16.18 oi.16.25 oi.16.32 oi.16.4 oi.16.47 oi.16.54 oi.16.61 seq_ctl
last_rcvd oi.16.11 oi.16.19 oi.16.26 oi.16.33 oi.16.40 oi.16.48 oi.16.55 oi.16.62 seq_srv
lost+found oi.16.12 oi.16.2 oi.16.27 oi.16.34 oi.16.41 oi.16.49 oi.16.56 oi.16.63
lov_objid oi.16.13 oi.16.20 oi.16.28 oi.16.35 oi.16.42 oi.16.5 oi.16.57 oi.16.7
NIDTBL_VERSIONS oi.16.14 oi.16.21 oi.16.29 oi.16.36 oi.16.43 oi.16.50 oi.16.58 oi.16.8
[root@weisshorn01 mdt]# ls -l
total 1957836
rw-rr- 1 root root 144 May 15 14:43 capa_keys
rwx----- 1 root root 2304 May 15 14:49 CATALOGS
drwxrwxrwx 2 root root 4096 Jul 19 17:07 CONFIGS
rw-rr- 1 root root 8192 May 15 14:43 fld
rw-rr- 1 root root 392832 May 15 14:43 last_rcvd
drwx------ 2 root root 16384 May 15 14:43 lost+found
rw-rr- 1 root root 576 May 15 14:43 lov_objid
drwxrwxrwx 2 root root 4096 May 15 14:43 NIDTBL_VERSIONS
drwxrwxrwx 2 root root 237568 Jul 20 08:56 OBJECTS
rw-rr- 1 root root 31494144 May 15 14:43 oi.16.0
rw-rr- 1 root root 26271744 May 15 14:43 oi.16.1
rw-rr- 1 root root 49315840 May 15 14:43 oi.16.10
rw-rr- 1 root root 29536256 May 15 14:43 oi.16.11
rw-rr- 1 root root 26890240 May 15 14:43 oi.16.12
rw-rr- 1 root root 20484096 May 15 14:43 oi.16.13
rw-rr- 1 root root 30490624 May 15 14:43 oi.16.14
rw-rr- 1 root root 33075200 May 15 14:43 oi.16.15
rw-rr- 1 root root 25034752 May 15 14:43 oi.16.16
rw-rr- 1 root root 43155456 May 15 14:43 oi.16.17
rw-rr- 1 root root 27435008 May 15 14:43 oi.16.18
rw-rr- 1 root root 21987328 May 15 14:43 oi.16.19
rw-rr- 1 root root 29138944 May 15 14:43 oi.16.2
rw-rr- 1 root root 21946368 May 15 14:43 oi.16.20
rw-rr- 1 root root 28278784 May 15 14:43 oi.16.21
rw-rr- 1 root root 28504064 May 15 14:43 oi.16.22
rw-rr- 1 root root 30584832 May 15 14:43 oi.16.23
rw-rr- 1 root root 27758592 May 15 14:43 oi.16.24
rw-rr- 1 root root 22654976 May 15 14:43 oi.16.25
rw-rr- 1 root root 24956928 May 15 14:43 oi.16.26
rw-rr- 1 root root 45015040 May 15 14:43 oi.16.27
rw-rr- 1 root root 27344896 May 15 14:43 oi.16.28
rw-rr- 1 root root 36724736 May 15 14:43 oi.16.29
rw-rr- 1 root root 23318528 May 15 14:43 oi.16.3
rw-rr- 1 root root 25739264 May 15 14:43 oi.16.30
rw-rr- 1 root root 26865664 May 15 14:43 oi.16.31
rw-rr- 1 root root 29147136 May 15 14:43 oi.16.32
rw-rr- 1 root root 28573696 May 15 14:43 oi.16.33
rw-rr- 1 root root 26796032 May 15 14:43 oi.16.34
rw-rr- 1 root root 30167040 May 15 14:43 oi.16.35
rw-rr- 1 root root 31641600 May 15 14:43 oi.16.36
rw-rr- 1 root root 21430272 May 15 14:43 oi.16.37
rw-rr- 1 root root 24567808 May 15 14:43 oi.16.38
rw-rr- 1 root root 29364224 May 15 14:43 oi.16.39
rw-rr- 1 root root 22032384 May 15 14:43 oi.16.4
rw-rr- 1 root root 41111552 May 15 14:43 oi.16.40
rw-rr- 1 root root 41889792 May 15 14:43 oi.16.41
rw-rr- 1 root root 34344960 May 15 14:43 oi.16.42
rw-rr- 1 root root 45531136 May 15 14:43 oi.16.43
rw-rr- 1 root root 34304000 May 15 14:43 oi.16.44
rw-rr- 1 root root 32129024 May 15 14:43 oi.16.45
rw-rr- 1 root root 30593024 May 15 14:43 oi.16.46
rw-rr- 1 root root 33566720 May 15 14:43 oi.16.47
rw-rr- 1 root root 31928320 May 15 14:43 oi.16.48
rw-rr- 1 root root 32591872 May 15 14:43 oi.16.49
rw-rr- 1 root root 29097984 May 15 14:43 oi.16.5
rw-rr- 1 root root 38350848 May 15 14:43 oi.16.50
rw-rr- 1 root root 24289280 May 15 14:43 oi.16.51
rw-rr- 1 root root 41656320 May 15 14:43 oi.16.52
rw-rr- 1 root root 35467264 May 15 14:43 oi.16.53
rw-rr- 1 root root 37556224 May 15 14:43 oi.16.54
rw-rr- 1 root root 32391168 May 15 14:43 oi.16.55
rw-rr- 1 root root 31694848 May 15 14:43 oi.16.56
rw-rr- 1 root root 35209216 May 15 14:43 oi.16.57
rw-rr- 1 root root 34750464 May 15 14:43 oi.16.58
rw-rr- 1 root root 33206272 May 15 14:43 oi.16.59
rw-rr- 1 root root 40476672 May 15 14:43 oi.16.6
rw-rr- 1 root root 26509312 May 15 14:43 oi.16.60
rw-rr- 1 root root 29929472 May 15 14:43 oi.16.61
rw-rr- 1 root root 34635776 May 15 14:43 oi.16.62
rw-rr- 1 root root 23273472 May 15 14:43 oi.16.63
rw-rr- 1 root root 34062336 May 15 14:43 oi.16.7
rw-rr- 1 root root 33783808 May 15 14:43 oi.16.8
rw-rr- 1 root root 33796096 May 15 14:43 oi.16.9
drwxr-xr-x 2 root root 5906432 May 15 14:43 PENDING
drwxr-xr-x 856 root root 36864 Jan 1 1970 ROOT
rw-rr- 1 root root 24 May 15 14:43 seq_ctl
rw-rr- 1 root root 24 May 15 14:43 seq_srv
Attachments
Issue Links
- is related to
-
LU-4794 MDS threads all stuck in jbd2_journal_start
- Resolved