Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4523

Need explanation for FS corruption - ldiskfs_mb_free_metadata: Double free of blocks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 12367

    Description

      The customer, Yale, encountered file system corruption on one of their OST devices, dm-20 which is "scratch-OST0028". Customer fan e2fsck on that device, which fixed the corruption, but now they would like to have a RCA to prevent it from happening in future.

      The corruption was first reported on Jan-11, but there aren't any irregular events on the storage side that would have caused such corruption, which could indicate the corruption happened sometime before and was only reported on the 11th.

      Jan 11 11:54:33 oss7 kernel: Lustre: 2916:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Conn stale 10.191.133.6@o2ib [old ver: 12, new ver: 12]
      Jan 11 11:54:33 oss7 kernel: Lustre: 2916:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Skipped 101 previous similar messages
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20): ldiskfs_mb_free_metadata: Double free of blocks 30208 (30208 148)
      Jan 11 11:59:52 oss7 kernel: Aborting journal on device dm-20-8.
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs (dm-20): Remounting filesystem read-only
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_reserve_inode_write: Journal has aborted
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_ext_remove_space: Journal has aborted
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_reserve_inode_write: Journal has aborted
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_orphan_del: Journal has aborted
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_reserve_inode_write: Journal has aborted
      Jan 11 11:59:52 oss7 kernel: LDISKFS-fs error (device dm-20) in ldiskfs_ext_truncate: Journal has aborted
      Jan 11 11:59:52 oss7 kernel: LustreError: 22850:0:(fsfilt-ldiskfs.c:369:fsfilt_ldiskfs_start()) error starting handle for op 8 (106 credits): rc -30

      FSCK output:
      e2fsck 1.42.3.wc3 (15-Aug-2012)
      device /dev/mapper/ost_scratch_40 mounted by lustre per /proc/fs/lustre/obdfilter/scratch-OST0028/mntdev
      Warning! /dev/mapper/ost_scratch_40 is mounted.
      MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...
      Warning: skipping journal recovery because doing a read-only filesystem check.
      scratch-OST0028 contains a file system with errors, check forced.
      Pass 1: Checking inodes, blocks, and sizes

      Running additional passes to resolve blocks claimed by more than one inode...
      Pass 1B: Rescanning for multiply-claimed blocks

      What other debugging or data can be pulled to explain the problem?

      Thanks,
      Oz

      Attachments

        1. kern.log.1
          165 kB
        2. uname_r
          0.0 kB
        3. version
          0.1 kB

        Activity

          People

            hongchao.zhang Hongchao Zhang
            orentas Oz Rentas
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: