Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7261

EA list corruption

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.5.0, Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
    • None
    • RHEL6 + lustre/master
    • 3
    • 9223372036854775807

    Description

      After some requirements from customers i checked an large EA patch and see we have a corrupted EA list in result. It's easy replicated with leave test file on disk and run a e2fsck / debugfs over test file.
      to leave a test file may used patch

      bash-3.2$ git diff
      diff --git a/lustre/tests/sanity.sh b/lustre/tests/sanity.sh
      index 824ba8f..620a626e 100644
      --- a/lustre/tests/sanity.sh
      +++ b/lustre/tests/sanity.sh
      @@ -6810,7 +6810,7 @@ grow_xattr() {
              [[ "$new" != "$orig" ]] && error "$xbig different after growing $xsml"
              log "$xbig still valid after growing $xsml"
       
      -       rm -f $file
      +#      rm -f $file
       }
       
       test_102h() { # bug 15777
      diff --git a/lustre/tests/test-framework.sh b/lustre/tests/test-framework.sh
      index be6d1ec..d1e91fb 100755
      --- a/lustre/tests/test-framework.sh
      +++ b/lustre/tests/test-framework.sh
      @@ -4325,8 +4325,8 @@ check_and_cleanup_lustre() {
           fi
       
              if is_mounted $MOUNT; then
      -               [ -n "$DIR" ] && rm -rf $DIR/[Rdfs][0-9]* ||
      -                       error "remove sub-test dirs failed"
      +#              [ -n "$DIR" ] && rm -rf $DIR/[Rdfs][0-9]* ||
      +#                      error "remove sub-test dirs failed"
                      [ "$ENABLE_QUOTA" ] && restore_quota || true
              fi
      

      debugfs / e2fsck output:

      ]# /Users/shadow/work/lustre/work/WorkQ/CLSTR-4851/e2fsprogs/debugfs/debugfs -R "stat ROOT/f102ha.sanity" /tmp/lustre-mdt1
      debugfs 1.42.12.x1 (03-Apr-2015)
      Inode: 133   Type: regular    Mode:  0644   Flags: 0x0
      Generation: 1408916363    Version: 0x00000001:00000010
      User:     0   Group:     0   Size: 0
      File ACL: 0    Directory ACL: 0
      Links: 1   Blockcount: 0
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x5614a1b5:00000000 -- Wed Oct  7 07:38:13 2015
       atime: 0x5614a1a8:00000000 -- Wed Oct  7 07:38:00 2015
       mtime: 0x5614a1a8:00000000 -- Wed Oct  7 07:38:00 2015
      crtime: 0x5614a1a8:def3d250 -- Wed Oct  7 07:38:00 2015
      Size of extra inode fields: 28
      Extended attributes stored in inode body: 
        lma = "00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 03 00 00 00 00 00 00 00 " (24)
        lma: fid=[0x200000401:0x3:0x0] compat=0 incompat=0
        lov = "d0 0b d1 0b 01 00 00 00 03 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 00 00 10 00 02 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 " (80)
        link = "df f1 ea 11 01 00 00 00 37 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 02 00 00 00 07 00 00 00 01 00 00 00 00 66 31 30 32 68 61 2e 73 61 6e 69 74 79 " (55)
      1. invalid EA entry in inode -> big
      BLOCKS:
      e2fsck 1.42.12.x1 (03-Apr-2015)
      Pass 1: Checking inodes, blocks, and sizes
      Extended attribute in inode 133 has a value size (65536) which is invalid
      Clear? yes
      
      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Unattached inode 134
      Connect to /lost+found? yes
      
      Unattached inode 135
      Connect to /lost+found? yes
      
      Pass 5: Checking group summary information
      

      one note. e2fsck 1.42.3.wc3 (15-Aug-2012) - can't find a bug in EA.

      Root cause of it bug, large EA forget to skip when we start update an data offsets after EA record changed.
      fix is simple

      -@@ -605,13 +883,17 @@ ext4_xattr_set_entry(struct ext4_xattr_i
      +@@ -606,13 +884,18 @@ ext4_xattr_set_entry(struct ext4_xattr_i
                              last = s->first;
                              while (!IS_LAST_ENTRY(last)) {
                                      size_t o = le16_to_cpu(last->e_value_offs);
       -                              if (!last->e_value_block &&
       -                                  last->e_value_size && o < offs)
      -+                              if (last->e_value_size > 0 && o < offs)
      ++                              if ((last->e_value_size > 0 && o < offs) 
      ++                                   && last->e_value_inum == 0)
                                              last->e_value_offs =
                                                      cpu_to_le16(o + size);
                                      last = EXT4_XATTR_NEXT(last);
      

      but i don't able to send because lack of gerrit login after OAuth changes.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: