Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13832

"lfs migrate -m" leads to inconsistent ldiskfs directories

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      I created a test directory with striped DNE directories as follows:

      # export MDSCOUNT=8
      # export DIR=/mnt/testfs/allmdt
      # lfs mkdir -c -1 $DIR
      # for D in $(seq $MDSCOUNT); do
          lfs mkdir -c 2 $DIR/dirstr$D
          rsync -a --exclude "policy.*" /etc/ $DIR/dirstr$D/
      done
      

      This created the test directories with a variety of files that can be verified. Then, migrate each directory and verify the contents have not changed (the rsync should not report any files that need to be updated):

      # for D in $(seq $MDSCOUNT); do
          echo $DIR/dirstr$D
          lfs migrate -m $((RANDOM % MDSCOUNT)) -c2 $DIR/dirstr$D
          rsync -av --exclude "policy.*" --dry-run /etc/ $DIR/dirstr$D/
      done
      

      I ran this a couple of times, then ran e2fsck on the MDTs, and all of them showed the same problem on a lot of remote directories:

      e2fsck 1.45.2.wc1 (27-May-2019)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Directory entry for '.' in ... (25191) is big. Split? yes
      Missing '..' in directory inode 25191. Fix? yes
      Setting filetype for entry '..' in ... (25191) to 2.
      :
      Pass 3: Checking directory connectivity   [[[ WHEN NOT FIXING ]]]
      '..' in /REMOTE_PARENT_DIR/0x200000407:0x6f5:0x0 (26203) is <The NULL inode> (0), should be /REMOTE_PARENT_DIR (25001).
      Fix? no
      [[[ OR ]]]
      Pass 3: Checking directory connectivity  [[[ WHEN FIXING ]]]
      Unconnected directory inode 25191 (/???)
      Connect to /lost+found? yes
      :
      Pass 4: Checking reference counts
      Inode 2 ref count is 0, should be 11.  Fix? yes
      
      Inode 25191 ref count is 3, should be 2.  Fix? yes
      

      Looking at the directories under REMOTE_PARENT_DIR it appears that the ".." entry is missing from the directory, so "." is a single 4096-byte entry that consumes the whole block. It may be that this hasn't been noticed in the past because these directories are all small and do not need to be split for HTREE, which would add a ".." as part of struct dx_info.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: