Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 1.8.x (1.8.0 - 1.8.5)
Labels:
- ldiskfs
Environment:
OS RHEL 5.5 cluster, MDT, OST on LVM volumes, SAN, storage HP XP24k

Severity:
3
Rank (Obsolete):
4000

Description

Our customer experienced MDT remounted read-only after MDS relocation from sklusp01b to sklusp01a cluster node.
They also performed relocation of OSS services during the same time.
When they noticed the RO status they tried to stop the MDS. The attempt to stop MDS was unsuccessful, the server got unresponsive and the other cluster node (sklusp01b) fenced the sklusp01a MDS server and took over the MDT, The sklusp01b was stopped after take-over and
then they run fsck which ended with huge number of errors, The repair was unsuccessful. It ended with recreation of whole Lustre FS and restore from backup.
Is it possible to determine the root cause from logs?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

sklusp01b-messages
1.32 MB
21/Jun/12 7:56 AM
sklusp01a-messages
191 kB
21/Jun/12 7:56 AM
fsck.out
1.17 MB
21/Jun/12 3:38 PM

Issue Links

Trackbacks

Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA

Activity

People

Assignee:: Niu Yawei (Inactive)

Reporter:: HP Slovakia team (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 21/Jun/12 7:56 AM

Updated:: 11/Mar/14 12:32 PM

Resolved:: 11/Mar/14 12:32 PM