Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.15.5
-
None
-
3
-
9223372036854775807
Description
While adding many OSTs in parallel, 5 out of 16 MDS crashed with journal corruption -28 errors. Here there was about 4000 OSTs being added.
Error 28 doesn't make sense. There is plenty of space because it is a brand new file system. The journal size for ext file system (LDISKFS) is 32GB and the system is not in use while adding these OSTs.
This issue is very similar to LU-18378
[33763.860611] LDISKFS-fs error (device dm-10): ldiskfs_getblk:1014: inode #166: block 14072026: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 1994, credits 5/0, errcode -28^M
Feb 27 13:09:06 mds1-primary-vni[33763.863335] LDISKFS-fs (dm-10): Remounting filesystem read-only^M
c-924205 kernel:[33763.864085] LDISKFS-fs error (device dm-10) in osd_trans_stop:2104: error 28^M
Lustre: ctl-lus[33763.865059] LDISKFS-fs error (device dm-10) in osd_trans_stop:2104: IO failure^M
trefs-MDT0000: super-sequence allocation rc = 0 [0x0000005800000400-0x0000005840000400]:33:ost^M
Feb 27 13:09:06 mds1-primary-vnic-924205 kernel: Lustre: Skipped 218 previous similar messages^M
Feb 27 13:09:06 mds1-primary-vnic-924205 kernel: Lustre: 339425:0:(osd_io.c:2114:osd_ldiskfs_write_record()) lustrefs-MDT0000/: adding bh without locking off 99200 (block 24, size 32, offs 99200)^M
Feb 27 13:09:06 mds1-primary-vnic-924205 kernel: WARNING: CPU: 25 PID: 339425 at fs/jbd2/transaction.c:1526 jbd2_journal_dirty_metadata+0x247/0x260 [jbd2]^M
Attachments
Issue Links
- is related to
-
LU-18378 MDS crashes with LDISKFS-fs error (device dm-2): ldiskfs_getblk:1014: inode #95: block 5895972: comm llog_process_th: journal_dirty_metadata failed: handle type 0 started at line 1990, credits 5/0, errcode -28
-
- Resolved
-
-
LU-19967 take extent tree depth and sb into account for credits calculation
-
- Resolved
-
- is related to
-
LU-18495 osd_ldiskfs_write_record(): adding bh without locking off
-
- Resolved
-