Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6096

sanity test_17m: e2fsck Inode 32775, i_size is 0, should be 4096

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.7.0
    • 3
    • 16971

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run of review-ldiskfs: https://testing.hpdd.intel.com/test_sets/277e606e-976d-11e4-bafa-5254006e85c2.

      I note that several other recent similar failures have been marked as LU-3534.
      As I'm unsure of the reasoning for that and this one is seen on el7 I've raised it as new.
      Somebody more expert may decide it's a dup after looking it over.

      The sub-test test_17m failed with the following error:

      e2fsck -fnvd /dev/lvm-Role_MDS/P1
      e2fsck 1.42.12.wc1 (15-Sep-2014)
      shadow-26vm8: check_blocks:2814: increase inode 32775 badness 0 to 1
      shadow-26vm8: check_blocks:2814: increase inode 32776 badness 0 to 1
      Pass 1: Checking inodes, blocks, and sizes
      Inode 32775, i_size is 0, should be 4096.  Fix? no
      
      Inode 32776, i_size is 0, should be 4096.  Fix? no
      

      Please provide additional information about the failure here.

      Info required for matching: sanity 17m

      Attachments

        Issue Links

          Activity

            [LU-6096] sanity test_17m: e2fsck Inode 32775, i_size is 0, should be 4096

            Actually I also have seen this error as well.

            simmonsja James A Simmons added a comment - Actually I also have seen this error as well.

            oh, yes, for sure.. we know, linux kernel api is super stable

            bzzz Alex Zhuravlev added a comment - oh, yes, for sure.. we know, linux kernel api is super stable
            bogl Bob Glossman (Inactive) added a comment - - edited

            that's possible but not certain. el7 ldiskfs is http://review.whamcloud.com/#/c/10249

            could be some other non-obvious diff in kernel internal API for vfs, for example
            even some e2fsck flaw specific to el7 is possible

            bogl Bob Glossman (Inactive) added a comment - - edited that's possible but not certain. el7 ldiskfs is http://review.whamcloud.com/#/c/10249 could be some other non-obvious diff in kernel internal API for vfs, for example even some e2fsck flaw specific to el7 is possible

            then can I guess this is introduced with one of the patches adding el7 support?

            bzzz Alex Zhuravlev added a comment - then can I guess this is introduced with one of the patches adding el7 support?

            Alex, yes exactly. Appears 100% reproducible but only with el7 server.

            bogl Bob Glossman (Inactive) added a comment - Alex, yes exactly. Appears 100% reproducible but only with el7 server.

            Bob, do I understand correctly that you can reproduce the issue with el7 server only? I can't reproduce with el6 locally.

            bzzz Alex Zhuravlev added a comment - Bob, do I understand correctly that you can reproduce the issue with el7 server only? I can't reproduce with el6 locally.

            The client version doesn't matter, only the server version, since clients don't interact with the soak filesystem directly.

            The affected files (PENDING, .lustre/fid) are internally created, so this is a bug in the osd-ldiskfs or MDD code.

            adilger Andreas Dilger added a comment - The client version doesn't matter, only the server version, since clients don't interact with the soak filesystem directly. The affected files (PENDING, .lustre/fid) are internally created, so this is a bug in the osd-ldiskfs or MDD code.

            non-zero Blockcount with 0 size on those entries looks kind of suspicious, but what do I know

            bogl Bob Glossman (Inactive) added a comment - non-zero Blockcount with 0 size on those entries looks kind of suspicious, but what do I know
            bogl Bob Glossman (Inactive) added a comment - - edited

            I tried the simplest possible config, client & server on one node. The problem reproduces.

            == sanity test 17m: run e2fsck against MDT which contains short/long symlink ========================= 14:21:42 (1421187702)
            create 512 short and long symlink files under /mnt/lustre/d17m.sanitym
            erase them
            Waiting for local destroys to complete
            recreate the 512 symlink files with a shorter string
            stop and checking mds1: e2fsck -fnvd /dev/sdb
            Stopping /mnt/mds1 (opts:) on centos7
            e2fsck 1.42.12.wc1 (15-Sep-2014)
            check_blocks:2814: increase inode 32777 badness 0 to 1
            check_blocks:2814: increase inode 32779 badness 0 to 1
            Pass 1: Checking inodes, blocks, and sizes
            Inode 32777, i_size is 0, should be 4096.  Fix? no
            
            Inode 32779, i_size is 0, should be 4096.  Fix? no
            
            Pass 2: Checking directory structure
            Pass 3: Checking directory connectivity
            Pass 4: Checking reference counts
            Pass 5: Checking group summary information
            
            lustre-MDT0000: ********** WARNING: Filesystem still has errors **********
            

            debugfs output

            [root@centos7 bogl]# umount /dev/sdb
            [root@centos7 bogl]# debugfs /dev/sdb
            debugfs 1.42.12.wc1 (15-Sep-2014)
            debugfs:  ncheck 32777 32779
            Inode	Pathname
            32779	//PENDING
            32777	/ROOT/.lustre/fid
            debugfs:  stat PENDING
            Inode: 32779   Type: directory    Mode:  0755   Flags: 0x0
            Generation: 512259630    Version: 0x00000000:00000000
            User:     0   Group:     0   Size: 0
            File ACL: 0    Directory ACL: 0
            Links: 2   Blockcount: 8
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
             atime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
             mtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
            crtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
            Size of extra inode fields: 28
            Extended attributes stored in inode body: 
              lma = "00 00 00 00 00 00 00 00 03 00 00 00 02 00 00 00 0a 00 00 00 00 00 00 00
             " (24)
              lma: fid=[0x200000003:0xa:0x0] compat=0 incompat=0
            BLOCKS:
            (0):16620
            TOTAL: 1
            
            debugfs:  stat /ROOT/.lustre/fid
            Inode: 32777   Type: directory    Mode:  0100   Flags: 0x0
            Generation: 512259628    Version: 0x00000000:00000000
            User:     0   Group:     0   Size: 0
            File ACL: 0    Directory ACL: 0
            Links: 2   Blockcount: 8
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
             atime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
             mtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
            crtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015
            Size of extra inode fields: 28
            Extended attributes stored in inode body: 
              lma = "00 00 00 00 00 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 00 00 00 00
             " (24)
              lma: fid=[0x200000002:0x2:0x0] compat=0 incompat=0
            BLOCKS:
            (0):16618
            TOTAL: 1
            
            bogl Bob Glossman (Inactive) added a comment - - edited I tried the simplest possible config, client & server on one node. The problem reproduces. == sanity test 17m: run e2fsck against MDT which contains short/long symlink ========================= 14:21:42 (1421187702) create 512 short and long symlink files under /mnt/lustre/d17m.sanitym erase them Waiting for local destroys to complete recreate the 512 symlink files with a shorter string stop and checking mds1: e2fsck -fnvd /dev/sdb Stopping /mnt/mds1 (opts:) on centos7 e2fsck 1.42.12.wc1 (15-Sep-2014) check_blocks:2814: increase inode 32777 badness 0 to 1 check_blocks:2814: increase inode 32779 badness 0 to 1 Pass 1: Checking inodes, blocks, and sizes Inode 32777, i_size is 0, should be 4096. Fix? no Inode 32779, i_size is 0, should be 4096. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information lustre-MDT0000: ********** WARNING: Filesystem still has errors ********** debugfs output [root@centos7 bogl]# umount /dev/sdb [root@centos7 bogl]# debugfs /dev/sdb debugfs 1.42.12.wc1 (15-Sep-2014) debugfs: ncheck 32777 32779 Inode Pathname 32779 //PENDING 32777 /ROOT/.lustre/fid debugfs: stat PENDING Inode: 32779 Type: directory Mode: 0755 Flags: 0x0 Generation: 512259630 Version: 0x00000000:00000000 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 2 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 atime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 mtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 crtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 Size of extra inode fields: 28 Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 03 00 00 00 02 00 00 00 0a 00 00 00 00 00 00 00 " (24) lma: fid=[0x200000003:0xa:0x0] compat=0 incompat=0 BLOCKS: (0):16620 TOTAL: 1 debugfs: stat /ROOT/.lustre/fid Inode: 32777 Type: directory Mode: 0100 Flags: 0x0 Generation: 512259628 Version: 0x00000000:00000000 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 2 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 atime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 mtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 crtime: 0x54b599b4:7dbd4a3c -- Tue Jan 13 14:18:28 2015 Size of extra inode fields: 28 Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 02 00 00 00 02 00 00 00 02 00 00 00 00 00 00 00 " (24) lma: fid=[0x200000002:0x2:0x0] compat=0 incompat=0 BLOCKS: (0):16618 TOTAL: 1

            errors reported above I think were due to missing quota-devel rpm on all my local test nodes. with that rpm installed on client & servers I'm no longer seeing any error in sanity, 17m. That's with el6 servers. Still trying to reproduce the original error in the ticket.

            bogl Bob Glossman (Inactive) added a comment - errors reported above I think were due to missing quota-devel rpm on all my local test nodes. with that rpm installed on client & servers I'm no longer seeing any error in sanity, 17m. That's with el6 servers. Still trying to reproduce the original error in the ticket.

            all the rest of test 17 subtests work fine. it looks like only 17m has problems

            bogl Bob Glossman (Inactive) added a comment - all the rest of test 17 subtests work fine. it looks like only 17m has problems

            People

              bzzz Alex Zhuravlev
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: