Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12386

ldiskfs-fs error: ldiskfs_iget:4374: inode #x: comm ll_ostx_y: bad extra_isize (36832 != 512)

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.10.5
    • None
    • lustre 2.10.5.2.chaos-1.ch6_1
      e2fsprogs 1.42.13.wc6-7.el7.x86_64
      kernel 3.10.0-862.14.4.1chaos.ch6.x86_64
      client side is running lustre 2.10.6_2.chaos
    • 3
    • 9223372036854775807

    Description

      Similar error occurred on 2 osts, 2 different nodes using 2 different DDN raid controllers. The ost aborted journal and was remounted RO.

      Subsequent e2fsck successfully cleared the problem inodes and the targets re-mounted. 

      We don't have extra_isize showing as a file system feature on these OST devs, or at least it doesn't show up in dumpe2fs output.

      The OSTs have been up and running ok since last September or so.  

      Attachments

        1. fsck-ost0008.txt
          6.10 MB
          Ruth Klundt
        2. fsck-ost002e.txt
          6 kB
          Ruth Klundt

        Activity

          [LU-12386] ldiskfs-fs error: ldiskfs_iget:4374: inode #x: comm ll_ostx_y: bad extra_isize (36832 != 512)
          pjones Peter Jones added a comment -

          ok - thanks Ruth

          pjones Peter Jones added a comment - ok - thanks Ruth

          Thanks for looking, I think we would mostly be concerned with whether there is a need to upgrade anything in order to avoid a repeat occurrence. If not then we can close for now.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - Thanks for looking, I think we would mostly be concerned with whether there is a need to upgrade anything in order to avoid a repeat occurrence. If not then we can close for now.

          Looking through the logs I don't see any kind of pattern with the broken inodes. They appear to be just random corruption in the inode block, with random flags set and bogus file sizes.

          It looks like the problem is limited to one block in the inode table (8 inodes), and the superblock, which could be recovered from a backup. The inodes were cleared by e2fsck, since they no longer contained useful information, so there isn't anything that can be done to recover the data there. It doesn't look like there are any other problems with the filesystem.

          At this point it isn't clear if anything can be done to diagnose the source of this problem. I don't know the hardware well enough to say whether the drive or cable that Joe reported could be causing this or not.

          adilger Andreas Dilger added a comment - Looking through the logs I don't see any kind of pattern with the broken inodes. They appear to be just random corruption in the inode block, with random flags set and bogus file sizes. It looks like the problem is limited to one block in the inode table (8 inodes), and the superblock, which could be recovered from a backup. The inodes were cleared by e2fsck, since they no longer contained useful information, so there isn't anything that can be done to recover the data there. It doesn't look like there are any other problems with the filesystem. At this point it isn't clear if anything can be done to diagnose the source of this problem. I don't know the hardware well enough to say whether the drive or cable that Joe reported could be causing this or not.

          ost002e output was not captured from the beginning.

          ost0008 I removed all of the 'fix? yes' lines and lines describing 'count wrong for group' since they were for the whole fs - someone has to actually look at these in order to approve release. Let me know if those omissions are of interest. Basically every group had a default value for blocks and inodes and was updated.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - ost002e output was not captured from the beginning. ost0008 I removed all of the 'fix? yes' lines and lines describing 'count wrong for group' since they were for the whole fs - someone has to actually look at these in order to approve release. Let me know if those omissions are of interest. Basically every group had a default value for blocks and inodes and was updated.

          I'm working on getting e2fsck output cleared to post.

          Other errors on the filesystem are things that are more or less usual, like high order page allocation failures, and grant complaints. ( LU-9704 )

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - I'm working on getting e2fsck output cleared to post. Other errors on the filesystem are things that are more or less usual, like high order page allocation failures, and grant complaints. ( LU-9704 )
          jamervi Joe Mervini added a comment -

          I checked out the storage subsystem and from the storage side of things (this is an SFA12K 10 stack) only 1 drive in the system is reporting a physical error. Otherwise there is no other reported errors. However, I checked on the IO channels (IB) and on one of the channels not associated with the servers with the OSTs is reporting symbol errors that appears to be a bad cable. This started getting reported at ~16:30 on 6/1 in the controller log. No other messages with the exception of the 'keep alive' messages were reported.

          jamervi Joe Mervini added a comment - I checked out the storage subsystem and from the storage side of things (this is an SFA12K 10 stack) only 1 drive in the system is reporting a physical error. Otherwise there is no other reported errors. However, I checked on the IO channels (IB) and on one of the channels not associated with the servers with the OSTs is reporting symbol errors that appears to be a bad cable. This started getting reported at ~16:30 on 6/1 in the controller log. No other messages with the exception of the 'keep alive' messages were reported.

          Ok, that rules out the old mke2fs bug.

          Do you have the actual e2fsck output? Sometimes it is possible to see, based on what corrupt values are printed, what might have been overwriting a block. It definitely seems like a block-level corruption, since we have 8 512-byte OST inodes per 4KB block.

          Any errors on the controllers? Any other errors in the filesystems?

          adilger Andreas Dilger added a comment - Ok, that rules out the old mke2fs bug. Do you have the actual e2fsck output? Sometimes it is possible to see, based on what corrupt values are printed, what might have been overwriting a block. It definitely seems like a block-level corruption, since we have 8 512-byte OST inodes per 4KB block. Any errors on the controllers? Any other errors in the filesystems?

          Thanks for the info about extra_isize, I'm less confused

          The filesystem was created on new gear in September 2018, with the software stack as listed above. One of the OSTs had a sequential group of 5 inodes with the problem, and they all had other corruption such as huge i_size, too many blocks, dtime set, and bitmap fixes were necessary. Also the fsck had to use a backup superblock because of 'bad block for block bitmap'.

          Not sure how to determine whether these inodes are new or old. The inode numbers were in the 36M range, with each ost having ~72M total inodes. Currently the number of inodes in use is ~3M on all the osts.

          On the other OST 8 consecutive inode numbers (in 33M range) were showing other problems in addition to extra_isize. No bad superblock though. 

          The OSTs are relatively large compared to what we've had before on ldiskfs, 74TB. They are 46% full.

          Not sure about recent changes in user activity, I'll be looking around for that.

          ruth.klundt@gmail.com Ruth Klundt (Inactive) added a comment - Thanks for the info about extra_isize, I'm less confused The filesystem was created on new gear in September 2018, with the software stack as listed above. One of the OSTs had a sequential group of 5 inodes with the problem, and they all had other corruption such as huge i_size, too many blocks, dtime set, and bitmap fixes were necessary. Also the fsck had to use a backup superblock because of 'bad block for block bitmap'. Not sure how to determine whether these inodes are new or old. The inode numbers were in the 36M range, with each ost having ~72M total inodes. Currently the number of inodes in use is ~3M on all the osts. On the other OST 8 consecutive inode numbers (in 33M range) were showing other problems in addition to extra_isize. No bad superblock though.  The OSTs are relatively large compared to what we've had before on ldiskfs, 74TB. They are 46% full. Not sure about recent changes in user activity, I'll be looking around for that.

          Ruth, the "inode #x" part of the message may be relevant if this filesystem was formatted a long time ago. There was a bug in very old mke2fs that didn't zero out the extra inode space in very low-numbered inodes (e.g. inodes 2-15 or so).

          Otherwise, it appears that this is inode corruption with some random garbage. Do you have the e2fsck output to see if those inodes had other corruption, or was only the i_extra_isize bad? When was the previous time that e2fsck was run? Was there anything run recently that would cause very old files to be accessed for the first time in a long time, or is this corruption on a recently-created file?

          Note that the extra_isize feature is not needed to use the large inode space, that is enabled by default when the filesystem is formatted with inodes larger than 256 bytes (as with all Lustre filesystems) and enough space is reserved for the current kernel's fixed inode fields (32 bytes currently). The extra_isize feature is only needed for the case where additional space is reserved beyond what is needed beyond the fixed inode fields.

          adilger Andreas Dilger added a comment - Ruth, the " inode #x " part of the message may be relevant if this filesystem was formatted a long time ago. There was a bug in very old mke2fs that didn't zero out the extra inode space in very low-numbered inodes (e.g. inodes 2-15 or so). Otherwise, it appears that this is inode corruption with some random garbage. Do you have the e2fsck output to see if those inodes had other corruption, or was only the i_extra_isize bad? When was the previous time that e2fsck was run? Was there anything run recently that would cause very old files to be accessed for the first time in a long time, or is this corruption on a recently-created file? Note that the extra_isize feature is not needed to use the large inode space, that is enabled by default when the filesystem is formatted with inodes larger than 256 bytes (as with all Lustre filesystems) and enough space is reserved for the current kernel's fixed inode fields (32 bytes currently). The extra_isize feature is only needed for the case where additional space is reserved beyond what is needed beyond the fixed inode fields.

          People

            adilger Andreas Dilger
            ruth.klundt@gmail.com Ruth Klundt (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: