Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11922

mkfs.lustre in 1.44.3.wc1 causes corruption if 'metadata_csum' option enabled

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0
    • Lustre 2.12.0, Lustre 2.13.0
    • 3
    • 9223372036854775807

    Description

      # mkfs.lustre --mgs --fsname=lustre /dev/sda 
      # mkfs.lustre --ost --servicenode=172.16.251.20@o2ib --mgsnode=172.16.251.20@o2ib --fsname=lustre --index=0 --reformat --mkfsoptions='-text4' /dev/sdc 
      
      # mount -t lustre /dev/sda /tmp/mgs/
      # mount -t lustre /dev/sdc /tmp/ost0/
      
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: LDISKFS-fs (sdc): file extents enabled, maximum tree depth=5
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: LDISKFS-fs (sdc): file extents enabled, maximum tree depth=5
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache,nodelalloc
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: LDISKFS-fs error (device sdc): htree_dirblock_to_tree:1278: inode #11272193: block 721437184:
               comm mount.lustre: bad entry in directory: rec_len is too small for name_len - offset=4084(4084), inode=0, rec_len=12, name_len=0
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: Aborting journal on device sdc-8.
      Feb  5 06:08:59 ai200-7f94-vm00 kernel: LDISKFS-fs (sdc): Remounting filesystem read-only
      Feb  5 06:09:04 ai200-7f94-vm00 kernel: LDISKFS-fs warning (device sdc): kmmpd:187: kmmpd being stopped since filesystem has been remounted as readonly.
      

      Without '-t ext4', it was no problem.
      Also, 1.42.13.wc6 didn't cause problem even with '-t ext4'. it seems lustre version doesn't matter. hit problem with 2.10, 2.12 and master.

      Attachments

        Issue Links

          Activity

            [LU-11922] mkfs.lustre in 1.44.3.wc1 causes corruption if 'metadata_csum' option enabled

            What is the reason for adding "-t ext4"? That enables variable and untested features to the filesystem, as shown by this ticket, and is not required AFAIK.

            adilger Andreas Dilger added a comment - What is the reason for adding " -t ext4 "? That enables variable and untested features to the filesystem, as shown by this ticket, and is not required AFAIK.

            Oh yes we do pass "-t ext4" during our format. We use csv with lustre_configure. Here is an example. (IP address are blocked out.)

            service401-ib1,"options lnet networks=o2ib(ib1)",/dev/mapper/nbp10_1-MGS0,/mnt/lustre/nbp10_1-MGS0,mgs,"nbp10",x.x.x.x@o2ib:x.x.x.x@o2ib,,,"-m 0 ","errors=panic,user_xattr,max_sectors_kb=0",x.x.x.159@o2ib:x.x.x.x@o2ib
            service401-ib1,"options lnet networks=o2ib(ib1)",/dev/mapper/nbp10_1-MDT0,/mnt/lustre/nbp10_1-MDT0,mdt,nbp10,"x.x.x.x@o2ib:x.x.x.x@o2ib",0,,"-m 0 -N 200000000 -t ext4","acl,errors=panic,user_xattr,max_sectors_kb=0",x.x.x.x@o2ib:x.x.x.x@o2ib
            service403-ib1,"options lnet networks=o2ib(ib1)",/dev/mapper/nbp10_2-MDT1,/mnt/lustre/nbp10_2-MDT1,mdt,nbp10,"x.x.x.x@o2ib:x.x.x.x@o2ib",1,,"-m 0 -N 200000000 -t ext4","acl,errors=panic,user_xattr,max_sectors_kb=0",x.x.x.x@o2ib:x.x.x.x@o2ib
            service401-ib1,"options lnet networks=o2ib(ib1)",/dev/mapper/nbp10_1-OST0,/mnt/lustre/nbp10_1-OST0,ost,nbp10,"x.x.x.x@o2ib:x.x.x.x@o2ib",0,,"-m 0 -N 34000000  -t ext4 -E packed_meta_blocks=1","acl,errors=panic,user_xattr,max_sectors_kb=0",x.x.x.x@o2ib:x.x.x.x@o2ib
            service403-ib1,"options lnet networks=o2ib(ib1)",/dev/mapper/nbp10_2-OST1,/mnt/lustre/nbp10_2-OST1,ost,nbp10,"x.x.x.x@o2ib:10.x.26.x@o2ib",1,,"-m 0 -N 34000000  -t ext4 -E packed_meta_blocks=1","acl,errors=panic,user_xattr,max_sectors_kb=0",x.x.x.x@o2ib:x.x.x.x@o2ib
            
            mhanafi Mahmoud Hanafi added a comment - Oh yes we do pass "-t ext4" during our format. We use csv with lustre_configure. Here is an example. (IP address are blocked out.) service401-ib1, "options lnet networks=o2ib(ib1)" ,/dev/mapper/nbp10_1-MGS0,/mnt/lustre/nbp10_1-MGS0,mgs, "nbp10" ,x.x.x.x@o2ib:x.x.x.x@o2ib,,, "-m 0 " , "errors=panic,user_xattr,max_sectors_kb=0" ,x.x.x.159@o2ib:x.x.x.x@o2ib service401-ib1, "options lnet networks=o2ib(ib1)" ,/dev/mapper/nbp10_1-MDT0,/mnt/lustre/nbp10_1-MDT0,mdt,nbp10, "x.x.x.x@o2ib:x.x.x.x@o2ib" ,0,, "-m 0 -N 200000000 -t ext4" , "acl,errors=panic,user_xattr,max_sectors_kb=0" ,x.x.x.x@o2ib:x.x.x.x@o2ib service403-ib1, "options lnet networks=o2ib(ib1)" ,/dev/mapper/nbp10_2-MDT1,/mnt/lustre/nbp10_2-MDT1,mdt,nbp10, "x.x.x.x@o2ib:x.x.x.x@o2ib" ,1,, "-m 0 -N 200000000 -t ext4" , "acl,errors=panic,user_xattr,max_sectors_kb=0" ,x.x.x.x@o2ib:x.x.x.x@o2ib service401-ib1, "options lnet networks=o2ib(ib1)" ,/dev/mapper/nbp10_1-OST0,/mnt/lustre/nbp10_1-OST0,ost,nbp10, "x.x.x.x@o2ib:x.x.x.x@o2ib" ,0,, "-m 0 -N 34000000 -t ext4 -E packed_meta_blocks=1" , "acl,errors=panic,user_xattr,max_sectors_kb=0" ,x.x.x.x@o2ib:x.x.x.x@o2ib service403-ib1, "options lnet networks=o2ib(ib1)" ,/dev/mapper/nbp10_2-OST1,/mnt/lustre/nbp10_2-OST1,ost,nbp10, "x.x.x.x@o2ib:10.x.26.x@o2ib" ,1,, "-m 0 -N 34000000 -t ext4 -E packed_meta_blocks=1" , "acl,errors=panic,user_xattr,max_sectors_kb=0" ,x.x.x.x@o2ib:x.x.x.x@o2ib

            Sure, I understand that the option was coming from /etc/mke2fs.conf.

            My question is why was metadata_csum taken from /etc/mke2fs.conf? mkfs.lustre doesn't specify any options to mke2fs that will normally cause this feature to be used. Did you have a custom mke2fs.conf file that added it to the [defaults] section, or specify some extra option like "mkfs.lustre --mkfsoptions='-t ext4'" that took this option from the "[fstypes].ext4" section, as was reported in this ticket originally? Do you have the output from mkfs.lustre that shows the command-line options for mke2fs?

            adilger Andreas Dilger added a comment - Sure, I understand that the option was coming from /etc/mke2fs.conf . My question is why was metadata_csum taken from /etc/mke2fs.conf ? mkfs.lustre doesn't specify any options to mke2fs that will normally cause this feature to be used. Did you have a custom mke2fs.conf file that added it to the [defaults] section, or specify some extra option like " mkfs.lustre --mkfsoptions='-t ext4' " that took this option from the " [fstypes] .ext4 " section, as was reported in this ticket originally? Do you have the output from mkfs.lustre that shows the command-line options for mke2fs ?

            The format option was getting picked up from /etc/mke2fs.conf. I just removed the option from the file as workaround.

            mhanafi Mahmoud Hanafi added a comment - The format option was getting picked up from /etc/mke2fs.conf. I just removed the option from the file as workaround.

            Mahmoud, could you please comment on how you are seeing the metadata_csum feature being enabled for your filesystem?  Is it possible that you are supplying additional formatting options to mkfs.lustre that would enable metadata_csum?

            Lustre does not enable this feature in mkfs.lustre, since this feature has never been tested, and the patch here is meant only to fix an obvious bug in the combination of metadata_csum and dirdata, but is not in any way an endorsement of the use of metadata_csum. The use of metadata checksums is surprisingly less useful for ldiskfs than it is for e.g. ZFS, because there is no backup copy of the metadata that can be used to recover from checksum errors, as there is in ZFS.

            In the case where a checksum error is hit by the mounted filesystem, the best that it can do is report an error and make the filesystem read-only, and e2fsck only has the option of recalculating the checksum based on the current metadata contents, or considering the metadata corrupt and discarding the inode/block/directory entirely. In essence, recomputing the checksum is no better than the kernel just ignoring the bad checksum and continuing on to use the metadata as-is, except with a system outage in the middle. Since ldiskfs already validates metadata content (since metadata_csum is only a recent addition), it can already typically determine whether the content is corrupted. ZFS on the other hand blindly assumes that if the block checksum is correct that the contents must be valid, and will happily use bad data within the block (e.g. dereference index values read from disk that exceed the bounds of an array).

            adilger Andreas Dilger added a comment - Mahmoud, could you please comment on how you are seeing the metadata_csum feature being enabled for your filesystem?  Is it possible that you are supplying additional formatting options to mkfs.lustre that would enable metadata_csum ? Lustre does not enable this feature in mkfs.lustre , since this feature has never been tested, and the patch here is meant only to fix an obvious bug in the combination of metadata_csum and dirdata , but is not in any way an endorsement of the use of metadata_csum . The use of metadata checksums is surprisingly less useful for ldiskfs than it is for e.g. ZFS, because there is no backup copy of the metadata that can be used to recover from checksum errors, as there is in ZFS. In the case where a checksum error is hit by the mounted filesystem, the best that it can do is report an error and make the filesystem read-only, and e2fsck only has the option of recalculating the checksum based on the current metadata contents, or considering the metadata corrupt and discarding the inode/block/directory entirely. In essence, recomputing the checksum is no better than the kernel just ignoring the bad checksum and continuing on to use the metadata as-is, except with a system outage in the middle. Since ldiskfs already validates metadata content (since metadata_csum is only a recent addition), it can already typically determine whether the content is corrupted. ZFS on the other hand blindly assumes that if the block checksum is correct that the contents must be valid, and will happily use bad data within the block (e.g. dereference index values read from disk that exceed the bounds of an array).

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35833
            Subject: LU-11922 ldiskfs: make dirdata work with metadata_csum
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 12e2fe9da3e7f80177070a9e5f2906ea5d5cd4f1

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35833 Subject: LU-11922 ldiskfs: make dirdata work with metadata_csum Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 12e2fe9da3e7f80177070a9e5f2906ea5d5cd4f1
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34219/
            Subject: LU-11922 ldiskfs: make dirdata work with metadata_csum
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ec7a166a498be607c3882ff11e98b625839e69d0

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34219/ Subject: LU-11922 ldiskfs: make dirdata work with metadata_csum Project: fs/lustre-release Branch: master Current Patch Set: Commit: ec7a166a498be607c3882ff11e98b625839e69d0
            dongyang Dongyang Li added a comment -

            We need to do more testing on this, especially performance tests. I've just done some simple mdtest.

            @Mahmoud correct, for now you can removing metadata_csum from /etc/mke2fs.conf.

            dongyang Dongyang Li added a comment - We need to do more testing on this, especially performance tests. I've just done some simple mdtest. @Mahmoud correct, for now you can removing metadata_csum from /etc/mke2fs.conf.

            Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/34219
            Subject: LU-11922 ldiskfs: make dirdata work with metadata_csum
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 466a6c245551fde0b414956e236a0a383a39d3c0

            gerrit Gerrit Updater added a comment - Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/34219 Subject: LU-11922 ldiskfs: make dirdata work with metadata_csum Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 466a6c245551fde0b414956e236a0a383a39d3c0

            People

              dongyang Dongyang Li
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: