[LU-501] ldiskfs on disk corruption Created: 12/Jul/11  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Alexey Lyashkov Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 2
Labels: None
Environment:

RHEL6 u0 (2.6.32-71.x.x) kernel, swraid


Severity: 3
Rank (Obsolete): 10119

 Description   

Jul 7 13:12:57 lmtest403 kernel: LDISKFS-fs error (device md7): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 81corrupted: 32768 blocks free in bitmap, 31744 - in gd
Jul 7 13:12:57 lmtest403 kernel:
Jul 7 13:12:57 lmtest403 kernel: Aborting journal on device md9p6.
Jul 7 13:12:57 lmtest403 kernel: LDISKFS-fs (md7): Remounting filesystem read-only
Jul 7 13:12:57 lmtest403 kernel: LustreError: 2963:0:(filter_io_26.c:765:filter_commitrw_write()) Failure to commit OST transaction (-5)?
Jul 7 13:12:57 lmtest403 kernel: LDISKFS-fs error (device md7): ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 81corrupted: 32768 blocks free in bitmap, 31744 - in gd
Jul 7 13:12:57 lmtest403 kernel:
Jul 7 13:12:57 lmtest403 kernel: LDISKFS-fs error (device md7) in ldiskfs_ext_new_extent_cb: IO failure
Jul 7 13:12:57 lmtest403 kernel: LustreError: 2998:0:(fsfilt-ldiskfs.c:497:fsfilt_ldiskfs_brw_start()) can't get handle for 301 credits: rc = -30
Jul 7 13:12:57 lmtest403 kernel: LustreError: 2984:0:(fsfilt-ldiskfs.c:1421:fsfilt_ldiskfs_write_record()) can't start transaction for 37 blocks (128 bytes)
Jul 7 13:12:57 lmtest403 kernel: LustreError: 2984:0:(filter.c:209:filter_finish_transno()) wrote trans 13763 for client e165ad12-c7ce-601a-cc3e-09e5d34abf1f at #1: err = -30
Jul 7 13:12:57 lmtest403 kernel: LustreError: 2984:0:(filter_io_26.c:531:filter_direct_io()) can't close transaction: -30
Jul 7 13:12:57 lmtest403 kernel: LDISKFS-fs error (device md7) in fsfilt_ldiskfs_commit_async: IO failure
Jul 7 13:12:57 lmtest403 kernel: LustreError: 2984:0:(fsfilt-ldiskfs.c:557:fsfilt_ldiskfs_commit_async()) error while stopping

I have none other info, just that copy from console.



 Comments   
Comment by Marek Magrys [ 19/Aug/11 ]

I think we also hit the same problem on RHEL5 with external RAID (LSI7900), moreover it's happening usually at the same OSTs (i.e. for last 10 crashes only 3 different OST were involved). It would be nice to maybe push the priority to Major, as it causes some computing jobs to fail.

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:07:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.