Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
None
-
None
-
Lustre 1.8.4ddn3.1
Kernel 2.6.18-194.32.1.el5
CentOS 5.x
-
2
-
23,959
-
6112
Description
OSS throws LDISKFS-fs error stating that it encountered an invalid block bitmap. This results in the OST being remounted read-only and requiring a reboot of the OSS to recover. A subsequent 'e2fsck -fp <dev>' replays the journal and finds no errors on the OST.
This issue has been seen spuriously during internal stress testing by Bernd and by some customers in the field. It has been seen by other Lustre users as well and reported on the lustre-discuss list. There is a bugzilla ticket open but it has not had any support activity since November 2010. I'm opening a Jira bug so this can be worked on.
https://bugzilla.lustre.org/show_bug.cgi?id=23959
Logs from the start of the invalid block bitmap:
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521146] LDISKFS-fs error (device dm-21): ldiskfs_valid_block_bitmap: Invalid block bitmap - block_group = 57, block = 1867778
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521155] Aborting journal on device dm-21-8.
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521183] LDISKFS-fs error (device dm-21): ldiskfs_journal_start_sb: Detected aborted journal
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521188] LDISKFS-fs (dm-21): Remounting filesystem read-only
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521205] LustreError: 16643:0:(fsfilt-ldiskfs.c:1320:fsfilt_ldiskfs_write_record()) can't start transaction for 37 blocks (128 bytes)
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521212] LustreError: 16643:0:(filter.c:192:filter_finish_transno()) wrote trans 21483454236 for client 1279815f-edd6-33ed-a1d2-a6685e1060af at #1606: err = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521219] LustreError: 16643:0:(filter_io_26.c:520:filter_direct_io()) can't close transaction: -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521228] LDISKFS-fs error (device dm-21) in fsfilt_ldiskfs_commit: IO failure
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521303] LustreError: 16465:0:(fsfilt-ldiskfs.c:496:fsfilt_ldiskfs_brw_start()) can't get handle for 582 credits: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521314] LustreError: 16465:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521329] LustreError: 16618:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521337] LustreError: 16455:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521347] LustreError: 16645:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.521687] LustreError: 16532:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.522014] LustreError: 16513:0:(fsfilt-ldiskfs.c:496:fsfilt_ldiskfs_brw_start()) can't get handle for 582 credits: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.522019] LustreError: 16513:0:(fsfilt-ldiskfs.c:496:fsfilt_ldiskfs_brw_start()) Skipped 4 previous similar messages
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.522026] LustreError: 16513:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.523770] LustreError: 16578:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.524759] LDISKFS-fs (dm-21): Remounting filesystem read-only
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529215] LDISKFS-fs error (device dm-21) in ldiskfs_ext_new_extent_cb: Journal has aborted
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529237] LustreError: 16405:0:(fsfilt-ldiskfs.c:1320:fsfilt_ldiskfs_write_record()) can't start transaction for 37 blocks (128 bytes)
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529244] LustreError: 16405:0:(filter.c:192:filter_finish_transno()) wrote trans 21483454237 for client f612f805-2ae3-e606-2bc6-074c557919a6 at #1608: err = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529250] LustreError: 16405:0:(filter_io_26.c:520:filter_direct_io()) can't close transaction: -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529452] LustreError: 18162:0:(obd.h:1394:obd_transno_commit_cb()) lfs0-OST0018: transno 21483454237 commit error: 2
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529690] LustreError: 16435:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Jul 2 23:23:20 lfs-oss-0-1 kernel: [4424700.529703] LustreError: 16410:0:(filter_io_26.c:690:filter_commitrw_write()) error starting transaction: rc = -30
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA