[LU-1547] MDT remounted read-only, MDS hung, MDT corrupted Created: 21/Jun/12 Updated: 11/Mar/14 Resolved: 11/Mar/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.x (1.8.0 - 1.8.5) |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | HP Slovakia team (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | ldiskfs | ||
| Environment: |
OS RHEL 5.5 cluster, MDT, OST on LVM volumes, SAN, storage HP XP24k |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4000 |
| Description |
|
Our customer experienced MDT remounted read-only after MDS relocation from sklusp01b to sklusp01a cluster node. |
| Comments |
| Comment by Peter Jones [ 21/Jun/12 ] |
|
Niu is investigating this one |
| Comment by Niu Yawei (Inactive) [ 21/Jun/12 ] |
Jun 17 23:03:08 sklusp01a kernel: LDISKFS-fs error (device dm-11): ldiskfs_lookup: unlinked inode 27720411 in dir #29287441 Jun 17 23:03:08 sklusp01a kernel: Remounting filesystem read-only Looks ldiskfs fail to find inode, which is an fs inconsistence error and caused RO. Not sure if it's an ext4 problem, I'll investigate it more. Andreas, Johann, any comments? Thanks. |
| Comment by HP Slovakia team (Inactive) [ 21/Jun/12 ] |
|
New may be important information: |
| Comment by Johann Lombardi (Inactive) [ 21/Jun/12 ] |
|
Could you please clarify what you intended to do with the lvconvert command? It is likely the root cause of your problem. |
| Comment by Andreas Dilger [ 21/Jun/12 ] |
|
The initial recovery appears to find a valid Lustre filesystem to mount: Jun 17 22:52:52 sklusp01a kernel: Lustre: 11216:0:(mds_fs.c:677:mds_init_server_data()) RECOVERY: service l1-MDT0000, 56 recoverable clients, 0 delayed clients, last_transno 133173826553 Later on, it finds a single error in the filesystem when it is cleaning up the orphan inodes: Jun 17 23:03:08 sklusp01a kernel: LDISKFS-fs error (device dm-11): ldiskfs_lookup: unlinked inode 27720411 in dir #29287441 Jun 17 23:03:08 sklusp01a kernel: Remounting filesystem read-only Jun 17 23:03:08 sklusp01a kernel: LDISKFS-fs warning (device dm-11): kmmpd: kmmpd being stopped since filesystem has been remounted as readonly. After failover to sklusp01b (which was quickly shut down), the MDS service is again started on sklusp01a and sees the same error: Jun 18 00:25:03 sklusp01a kernel: LDISKFS-fs error (device dm-14): ldiskfs_lookup: unlinked inode 27720411 in dir #29287441 Jun 18 00:25:03 sklusp01a kernel: Remounting filesystem read-only At least during these times, the filesystem was intact enough to be able to mount and read basic Lustre configuration files. I can't comment on the severity of the corruption seen by e2fsck, but the kernel only saw a relatively minor problem (directory entry for an open-unlinked inode was actually deleted, which may possibly relate to nlink problems previously fixed in https://bugzilla.lustre.org/show_bug.cgi?id=22177 for 1.8.3). It also appears you have MMP enabled on this filesystem, which would normally prevent it from being mounted on two nodes at the same time. From the timestamps in the logs, it does not appear that the two MDS services were active at the same time on the two nodes. Unfortunately, I'm not familiar enough with the details of CLVM and what lvconvert does in this case to comment on whether this is safe to do on a running system or not. It is possible that "lconvert" and/or the mirror resync process incorrectly mirrored the LV between nodes, possibly getting some part of the device inconsistent between the two MDS nodes. It is also possible (depending on how IO was being done by LVM to keep the mirrors in sync) that data was still in cache on sklusp01b, and not flushed to disk on sklusp01a at the time of failover. The MDS does operations asynchronously in memory, and only flushes them to disk every few seconds at transaction commit time. Conversely, the OSS does writes synchronously to disk because this avoids too much memory pressure at high IO rates, so it may be the same inconsistency would not be visible on the OSS due to frequent sync of data to disk. Having the output from e2fsck would allow a guess at what type of corruption was seen, and how it might be introduced. |
| Comment by HP Slovakia team (Inactive) [ 21/Jun/12 ] |
|
I have attached the fsck -fn ... output. It was run when after failover to sklusp01b the MDS was stopped. |
| Comment by HP Slovakia team (Inactive) [ 21/Jun/12 ] |
|
To Johann's question: |
| Comment by John Fuchs-Chesney (Inactive) [ 07/Mar/14 ] |
|
Akos and HP Slovakia team, |
| Comment by HP Slovakia team (Inactive) [ 11/Mar/14 ] |
|
John, ticket can be closed. The issue was caused by a bug in CLVM. |
| Comment by Peter Jones [ 11/Mar/14 ] |
|
Thanks Akos! |