Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6984

Failure to delete over a million files in a DNE2 directory.

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.8.0
    • pre-2.8 clients with DNE2 directories which contain 1 million or more files.
    • 3
    • 9223372036854775807

    Description

      In my testing of DNE2 I'm seeing problems when creating 1 million+ files per directory. Clearing out the debug logs I see the problem is only on the client side. When running a application I see:

      command line used: /lustre/sultan/stf008/scratch/jsimmons/mdtest -I 100000 -i 5 -d /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test/shared_1000k_10
      Path: /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test
      FS: 21.8 TiB Used FS: 0.2% Inodes: 58.7 Mi Used Inodes: 4.6%

      10 tasks, 1000000 files/directories
      aprun: Apid 3172: Caught signal Window changed, sending to application
      08/03/2015 10:34:45: Process 0(nid00028): FAILED in create_remove_directory_tree, Unable to remove directory: No such file or directory
      Rank 0 [Mon Aug 3 10:34:45 2015] [c0-0c0s1n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
      _pmiu_daemon(SIGCHLD): [NID 00028] [c0-0c0s1n2] [Mon Aug 3 10:34:45 2015] PE RANK 0 exit signal Aborted
      aprun: Apid 3172: Caught signal Interrupt, sending to application
      _pmiu_daemon(SIGCHLD): [NID 00012] [c0-0c0s6n0] [Mon Aug 3 10:50:50 2015] PE RANK 7 exit signal Interrupt
      _pmiu_daemon(SIGCHLD): [NID 00018] [c0-0c0s6n2] [Mon Aug 3 10:50:50 2015] PE RANK 9 exit signal Interrupt
      _pmiu_daemon(SIGCHLD): [NID 00013] [c0-0c0s6n1] [Mon Aug 3 10:50:50 2015] PE RANK 8 exit signal Interrupt

      After the test failed any attempt to remove the files create by these test fail. When I attempt to remove the files I see the following errors in dmesg.

      LustreError: 5430:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
      LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
      LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) Skipped 7 previous similar messages
      LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2

      Attachments

        1. lctldump.20150813
          0.2 kB
        2. LU-6381.log
          0.2 kB
        3. LU-6984-backtrace.log
          83 kB
        4. lu-6984-Sept-18-2015.tgz
          0.2 kB

        Issue Links

          Activity

            [LU-6984] Failure to delete over a million files in a DNE2 directory.
            pjones Peter Jones added a comment -

            Right you are James!

            pjones Peter Jones added a comment - Right you are James!

            I have tested this patch extensively and it has resolved this issue. Now that it has landed we can close this ticket.

            simmonsja James A Simmons added a comment - I have tested this patch extensively and it has resolved this issue. Now that it has landed we can close this ticket.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16490/
            Subject: LU-6984 lmv: remove nlink check in lmv_revalidate_slaves
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 5725559786086f328f4c899936967eb6e5dce46e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16490/ Subject: LU-6984 lmv: remove nlink check in lmv_revalidate_slaves Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5725559786086f328f4c899936967eb6e5dce46e

            So far your fix seems to have resolved this issue. Tomorrow I will run the mdtest to see if everything works.

            simmonsja James A Simmons added a comment - So far your fix seems to have resolved this issue. Tomorrow I will run the mdtest to see if everything works.

            wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16490
            Subject: LU-6984 lmv: remove nlink check in lmv_revalidate_slaves
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d676ea2ab1f55dfa8e04ed5fa074444315808329

            gerrit Gerrit Updater added a comment - wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/16490 Subject: LU-6984 lmv: remove nlink check in lmv_revalidate_slaves Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d676ea2ab1f55dfa8e04ed5fa074444315808329
            di.wang Di Wang added a comment -

            Ah, It seems this check in lmv_revalidate_slaves is not correct

                                if (unlikely(body->mbo_nlink < 2)) {
                                            /* If this is bad stripe, most likely due
                                             * to the race between close(unlink) and
                                             * getattr, let's return -EONENT, so llite
                                             * will revalidate the dentry see
                                             * ll_inode_revalidate_fini() */
                                            CDEBUG(D_INODE, "%s: nlink %d < 2 bad stripe %d"
                                                   DFID ":" DFID"\n",
                                                   obd->obd_name, body->mbo_nlink, i,
                                                   PFID(&lsm->lsm_md_oinfo[i].lmo_fid),
                                                   PFID(&lsm->lsm_md_oinfo[0].lmo_fid));
            
                                            if (it.d.lustre.it_lock_mode && lockh) {
                                                    ldlm_lock_decref_and_cancel(lockh,
                                                             it.d.lustre.it_lock_mode);
                                                    it.d.lustre.it_lock_mode = 0;
                                            }
            
                                            GOTO(cleanup, rc = -ENOENT);
                                    }
            

            Because

                    /*
                     * The DIR_NLINK feature allows directories to exceed LDISKFS_LINK_MAX
                     * (65000) subdirectories by storing "1" in i_nlink if the link count
                     * would otherwise overflow. Directory tranversal tools understand
                     * that (st_nlink == 1) indicates that the filesystem dose not track
                     * hard links count on the directory, and will not abort subdirectory
                     * scanning early once (st_nlink - 2) subdirs have been found.
                     *
                     * This also has to properly handle the case of inodes with nlink == 0
                     * in case they are being linked into the PENDING directory
                     */
            

            I will remove this.

            di.wang Di Wang added a comment - Ah, It seems this check in lmv_revalidate_slaves is not correct if (unlikely(body->mbo_nlink < 2)) { /* If this is bad stripe, most likely due * to the race between close(unlink) and * getattr, let's return -EONENT, so llite * will revalidate the dentry see * ll_inode_revalidate_fini() */ CDEBUG(D_INODE, "%s: nlink %d < 2 bad stripe %d" DFID ":" DFID"\n", obd->obd_name, body->mbo_nlink, i, PFID(&lsm->lsm_md_oinfo[i].lmo_fid), PFID(&lsm->lsm_md_oinfo[0].lmo_fid)); if (it.d.lustre.it_lock_mode && lockh) { ldlm_lock_decref_and_cancel(lockh, it.d.lustre.it_lock_mode); it.d.lustre.it_lock_mode = 0; } GOTO(cleanup, rc = -ENOENT); } Because /* * The DIR_NLINK feature allows directories to exceed LDISKFS_LINK_MAX * (65000) subdirectories by storing "1" in i_nlink if the link count * would otherwise overflow. Directory tranversal tools understand * that (st_nlink == 1) indicates that the filesystem dose not track * hard links count on the directory, and will not abort subdirectory * scanning early once (st_nlink - 2) subdirs have been found. * * This also has to properly handle the case of inodes with nlink == 0 * in case they are being linked into the PENDING directory */ I will remove this.

            The first run of mdtest takes a while before failure. Once it fails you can duplicate the failure with rm -rf the left over files from mdtest.

            I attached the logs for my latest test from the client node and the all the MDS servers I have.

            simmonsja James A Simmons added a comment - The first run of mdtest takes a while before failure. Once it fails you can duplicate the failure with rm -rf the left over files from mdtest. I attached the logs for my latest test from the client node and the all the MDS servers I have.
            di.wang Di Wang added a comment -

            James: thanks. And usually how soon did you met the failure? after a few minutes? a few hours after starting the test?

            di.wang Di Wang added a comment - James: thanks. And usually how soon did you met the failure? after a few minutes? a few hours after starting the test?
            simmonsja James A Simmons added a comment - I did you one better. Grab my source rpm at http://www.infradead.org/~jsimmons/lustre-2.7.59-1_g703195a.src.rpm
            di.wang Di Wang added a comment -

            Ok, I tried to reproduce it on Opensfs cluster with 8 MDTs (4 MDS) and 4 OSTs(2 OSS). 9 clients. Just start the test, it has been an hour, still can not see this problem. I will check tomorrow morning to see how it goes?

            James: could you please tell me all of your patches(based on master)? Thanks.

            di.wang Di Wang added a comment - Ok, I tried to reproduce it on Opensfs cluster with 8 MDTs (4 MDS) and 4 OSTs(2 OSS). 9 clients. Just start the test, it has been an hour, still can not see this problem. I will check tomorrow morning to see how it goes? James: could you please tell me all of your patches(based on master)? Thanks.

            People

              di.wang Di Wang
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: