Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11549

Unattached inodes after 3 min racer run.

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0
    • Lustre 2.11.0, Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      An attempt to run racer.sh on a DNE system with rpms built from wc master branch:

      [root@cslmodev100 racer]# sh racer.sh  -t 180 -T 7 -f 20 -c -d /mnt/testfs/racer-dir/
      Directory:     /mnt/testfs/racer-dir/
      Time Limit:    180
      Lustre Tests:  1
      Max Files:     20
      Threads:       7
      Running Tests: lustre_file_create dir_create file_rm file_rename file_link file_symlink file_list file_concat file_exec dir_remote
      MDS Count:     3
      Running racer.sh for 180 seconds. CTRL-C to exit
      file_create: FILE=/mnt/testfs/racer-dir//10 SIZE=207136
      file_create: FILE=/mnt/testfs/racer-dir//17 SIZE=4800
      file_create: FILE=/mnt/testfs/racer-dir//5 SIZE=73416
      file_create: FILE=/mnt/testfs/racer-dir//19 SIZE=116024
      ...
      file_create: FILE=/mnt/testfs/racer-dir//0 SIZE=234400
      file_create: FILE=/mnt/testfs/racer-dir//2 SIZE=136432
      file_create: FILE=/mnt/testfs/racer-dir//8 SIZE=53296
      file_create: FILE=/mnt/testfs/racer-dir//12 SIZE=233528
      racer cleanup
      sleeping 5 sec ...
      lustre_file_create.sh: no process found
      dir_create.sh: no process found
      file_rm.sh: no process found
      file_rename.sh: no process found
      file_link.sh: no process found
      file_symlink.sh: no process found
      file_list.sh: no process found
      file_concat.sh: no process found
      file_exec.sh: no process found
      dir_remote.sh: no process found
      there should be NO racer processes:
      root     201964  0.0  0.0 112660   988 pts/24   S+   11:03   0:00 grep -E lustre_file_create|dir_create|file_rm|file_rename|file_link|file_symlink|file_list|file_concat|file_exec|dir_remote
      Filesystem                                   1K-blocks    Used    Available Use% Mounted on
      172.18.1.3@o2ib1,172.18.1.4@o2ib1:/testfs 240559470792 4289196 238129624412   1% /mnt/testfs
      We survived racer.sh for 180 seconds.
      

       
      e2fsck on the MDT0 device:

      [root@cslmodev103 ~]# umount /dev/md66
      [root@cslmodev103 ~]# e2fsck -fvn /dev/md66
      e2fsck 1.42.13.x6 (01-Mar-2018)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Pass 3: Checking directory connectivity
      Pass 4: Checking reference counts
      Inode 2061541938 ref count is 3, should be 2.  Fix? no
      
      Unattached inode 2061541970
      Connect to /lost+found? no
      
      Inode 2061541986 ref count is 2, should be 1.  Fix? no
      
      Unattached inode 2061542181
      Connect to /lost+found? no
      
      Inode 2061542575 ref count is 10, should be 9.  Fix? no
      
      Inode 2061542583 ref count is 6, should be 5.  Fix? no
      
      Pass 5: Checking group summary information
      [QUOTA WARNING] Usage inconsistent for ID 0:actual (1248931840, 1295) != expected (1248919552, 1295)
      Update quota info for quota type 0? no
      
      [QUOTA WARNING] Usage inconsistent for ID 0:actual (1248931840, 1295) != expected (1248919552, 1295)
      Update quota info for quota type 1? no
      
      
      testfs-MDT0000: ********** WARNING: Filesystem still has errors **********
      
      
              1304 inodes used (0.00%, out of 3042005760)
                85 non-contiguous files (6.5%)
                 2 non-contiguous directories (0.2%)
                   # of inodes with ind/dind/tind blocks: 72/64/0
         381833025 blocks used (25.10%, out of 1520996090)
                 0 bad blocks
                 2 large files
      
               555 regular files
               491 directories
                 0 character device files
                 0 block device files
                 0 fifos
               107 links
               249 symbolic links (249 fast symbolic links)
                 0 sockets
      ------------
              1400 files
      [root@cslmodev103 ~]#
      
      [root@cslmodev103 ~]# dumpe2fs -h /dev/md66 | grep -i state
      dumpe2fs 1.42.13.x6 (01-Mar-2018)
      Filesystem state:         clean
      [root@cslmodev103 ~]#
      
      

      Invalid symlink inodes are due to LU-11130, wrong nlinks are due to LU-11446,
      unattached inodes are what this ticket is about.

      The racer test script doesn't use migrate or striped dirs,
      just "lustre_file_create dir_create file_rm file_rename file_link file_symlink file_list file_concat file_exec dir_remote". Also there is no failovers.

      Lustre is built from the tip of the wc master branch:

      $ git log --oneline wc/master
      fe7c13bd48 (wc/master) LU-11329 utils: create tests maintainers list
      70a01a6c9c LU-11276 ldlm: don't apply ELC to converting and DOM locks
      72372486a5 LU-11347 osd: do not use pagecache for I/O
      8b9105d828 LU-11199 mdt: Attempt lookup lock on open
      697e8fe6f3 LU-11473 doc: add lfs-getsom man page
      ed0c19d250 LU-1095 misc: quiet console messages at startup
      ....
      
      [root@cslmodev103 ~]# rpm -q lustre_ib
      lustre_ib-2.11.56_16_gfe7c13b-1.el7.centos.x86_64
      [root@cslmodev103 ~]#
      

      Attachments

        Issue Links

          Activity

            [LU-11549] Unattached inodes after 3 min racer run.
            zam Alexander Zarochentsev made changes -
            Link New: This issue is related to LU-13346 [ LU-13346 ]
            adilger Andreas Dilger made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]

            The fix for this problem has been landed for 2.13.0 so this ticket should be closed so that it can be tracked properly for the release. Since the test patch is not currently passing testing I've opened a separate ticket to track that landing.

            adilger Andreas Dilger added a comment - The fix for this problem has been landed for 2.13.0 so this ticket should be closed so that it can be tracked properly for the release. Since the test patch is not currently passing testing I've opened a separate ticket to track that landing.
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-12848 [ LU-12848 ]
            pjones Peter Jones made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Alexander Zarochentsev [ zam ]
            adilger Andreas Dilger made changes -
            Fix Version/s New: Lustre 2.13.0 [ 14290 ]

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35776/
            Subject: LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8d27c92a66d63aaf8b8fbe1fc73e49263b5bed1e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35776/ Subject: LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8d27c92a66d63aaf8b8fbe1fc73e49263b5bed1e

            Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35991
            Subject: LU-11549 tests: link succeded to an ophan remote object
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 3e6574997f82407c98b90318448400bb3a1ca5e0

            gerrit Gerrit Updater added a comment - Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35991 Subject: LU-11549 tests: link succeded to an ophan remote object Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3e6574997f82407c98b90318448400bb3a1ca5e0

            The above patch https://review.whamcloud.com/35776 fixes the issue, my reproducer doesn't show any fs corruption . Unfortunately it reports an incorrect link creation even with the fix b/c the test was not ready that client repeats the link() operation after getting ESTALE error, so second attempt to create the link works.

            the fix is one-liner:

            diff --git a/lustre/mdd/mdd_dir.c b/lustre/mdd/mdd_dir.c
            index 17613d6..260dd7a 100644
            --- a/lustre/mdd/mdd_dir.c
            +++ b/lustre/mdd/mdd_dir.c
            @@ -1456,7 +1456,7 @@ static int mdd_mark_orphan_object(const struct lu_env *env,
                    struct lu_attr *attr = MDD_ENV_VAR(env, la_for_start);
                    int rc;
             
            -       if (!S_ISDIR(mdd_object_type(obj)))
            +       if (S_ISDIR(mdd_object_type(obj)))
                            return 0;
             
                    attr->la_valid = LA_FLAGS;
            
            zam Alexander Zarochentsev added a comment - The above patch https://review.whamcloud.com/35776 fixes the issue, my reproducer doesn't show any fs corruption . Unfortunately it reports an incorrect link creation even with the fix b/c the test was not ready that client repeats the link() operation after getting ESTALE error, so second attempt to create the link works. the fix is one-liner: diff --git a/lustre/mdd/mdd_dir.c b/lustre/mdd/mdd_dir.c index 17613d6..260dd7a 100644 --- a/lustre/mdd/mdd_dir.c +++ b/lustre/mdd/mdd_dir.c @@ -1456,7 +1456,7 @@ static int mdd_mark_orphan_object( const struct lu_env *env, struct lu_attr *attr = MDD_ENV_VAR(env, la_for_start); int rc; - if (!S_ISDIR(mdd_object_type(obj))) + if (S_ISDIR(mdd_object_type(obj))) return 0; attr->la_valid = LA_FLAGS;

            Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35776
            Subject: LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fd759510de938a3556eddf3b5798ce95135e1dbc

            gerrit Gerrit Updater added a comment - Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/35776 Subject: LU-11549 mdd: set LUSTRE_ORPHAN_FL for non-dirs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fd759510de938a3556eddf3b5798ce95135e1dbc

            People

              zam Alexander Zarochentsev
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: