Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6831

The ticket for tracking all DNE2 bugs

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0, Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      This ticket is for tracking all of DNE2 bugs.

      Attachments

        Issue Links

          Activity

            [LU-6831] The ticket for tracking all DNE2 bugs

            I attached my client logs to LU-6984.

            simmonsja James A Simmons added a comment - I attached my client logs to LU-6984 .

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15720/
            Subject: LU-6831 lmv: revalidate the dentry for striped dir
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a17909a92da74cb26fb9bf2824f968b2adf0897e

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15720/ Subject: LU-6831 lmv: revalidate the dentry for striped dir Project: fs/lustre-release Branch: master Current Patch Set: Commit: a17909a92da74cb26fb9bf2824f968b2adf0897e

            Testing to see if the problem exist on directory striped across 8 MDS servers. Waiting for the results. I will push some log data soon for you.

            simmonsja James A Simmons added a comment - Testing to see if the problem exist on directory striped across 8 MDS servers. Waiting for the results. I will push some log data soon for you.

            James: Any news for this -2 problem? Thanks

            di.wang Di Wang (Inactive) added a comment - James: Any news for this -2 problem? Thanks

            James: no, I did not see these errors? Could you please collect -1 debug log on client side, when you remove one of these files? thanks

            di.wang Di Wang (Inactive) added a comment - James: no, I did not see these errors? Could you please collect -1 debug log on client side, when you remove one of these files? thanks

            An update in my latest testing. I'm still seeing problems when creating 1 million+ files per directory. Clearing out the debug logs I see the problem is only on the client side. When running a application I see:

            command line used: /lustre/sultan/stf008/scratch/jsimmons/mdtest -I 100000 -i 5 -d /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test/shared_1000k_10
            Path: /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test
            FS: 21.8 TiB Used FS: 0.2% Inodes: 58.7 Mi Used Inodes: 4.6%

            10 tasks, 1000000 files/directories
            aprun: Apid 3172: Caught signal Window changed, sending to application
            08/03/2015 10:34:45: Process 0(nid00028): FAILED in create_remove_directory_tree, Unable to remove directory: No such file or directory
            Rank 0 [Mon Aug 3 10:34:45 2015] [c0-0c0s1n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
            _pmiu_daemon(SIGCHLD): [NID 00028] [c0-0c0s1n2] [Mon Aug 3 10:34:45 2015] PE RANK 0 exit signal Aborted
            aprun: Apid 3172: Caught signal Interrupt, sending to application
            _pmiu_daemon(SIGCHLD): [NID 00012] [c0-0c0s6n0] [Mon Aug 3 10:50:50 2015] PE RANK 7 exit signal Interrupt
            _pmiu_daemon(SIGCHLD): [NID 00018] [c0-0c0s6n2] [Mon Aug 3 10:50:50 2015] PE RANK 9 exit signal Interrupt
            _pmiu_daemon(SIGCHLD): [NID 00013] [c0-0c0s6n1] [Mon Aug 3 10:50:50 2015] PE RANK 8 exit signal Interrupt

            After the test failed any attempt to remove the files create by these test fail. When I attempt to remove the files I see the following errors in dmesg.

            LustreError: 5430:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
            LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2
            LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) Skipped 7 previous similar messages
            LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2

            DiWang have you seen these errors during your testing?

            simmonsja James A Simmons added a comment - An update in my latest testing. I'm still seeing problems when creating 1 million+ files per directory. Clearing out the debug logs I see the problem is only on the client side. When running a application I see: command line used: /lustre/sultan/stf008/scratch/jsimmons/mdtest -I 100000 -i 5 -d /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test/shared_1000k_10 Path: /lustre/sultan/stf008/scratch/jsimmons/dne2_4_mds_md_test FS: 21.8 TiB Used FS: 0.2% Inodes: 58.7 Mi Used Inodes: 4.6% 10 tasks, 1000000 files/directories aprun: Apid 3172: Caught signal Window changed, sending to application 08/03/2015 10:34:45: Process 0(nid00028): FAILED in create_remove_directory_tree, Unable to remove directory: No such file or directory Rank 0 [Mon Aug 3 10:34:45 2015] [c0-0c0s1n2] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 _pmiu_daemon(SIGCHLD): [NID 00028] [c0-0c0s1n2] [Mon Aug 3 10:34:45 2015] PE RANK 0 exit signal Aborted aprun: Apid 3172: Caught signal Interrupt, sending to application _pmiu_daemon(SIGCHLD): [NID 00012] [c0-0c0s6n0] [Mon Aug 3 10:50:50 2015] PE RANK 7 exit signal Interrupt _pmiu_daemon(SIGCHLD): [NID 00018] [c0-0c0s6n2] [Mon Aug 3 10:50:50 2015] PE RANK 9 exit signal Interrupt _pmiu_daemon(SIGCHLD): [NID 00013] [c0-0c0s6n1] [Mon Aug 3 10:50:50 2015] PE RANK 8 exit signal Interrupt After the test failed any attempt to remove the files create by these test fail. When I attempt to remove the files I see the following errors in dmesg. LustreError: 5430:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2 LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2 LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) Skipped 7 previous similar messages LustreError: 5451:0:(llite_lib.c:2286:ll_prep_inode()) new_inode -fatal: rc -2 DiWang have you seen these errors during your testing?

            Sorry, it might be a mistakes, even the patch on this ticket is not landed.

            di.wang Di Wang (Inactive) added a comment - Sorry, it might be a mistakes, even the patch on this ticket is not landed.

            The patch for this ticket landed but I like to see this kept open to handle any further bug reports.

            simmonsja James A Simmons added a comment - The patch for this ticket landed but I like to see this kept open to handle any further bug reports.

            Translating James' list to ticket numbers for tracking purposes:
            LU-6427
            LU-6586
            LU-6819
            LU-6831
            LU-6840
            LU-6846
            LU-6874
            LU-6875
            LU-6880
            LU-6881
            LU-6896
            LU-6904
            LU-6906
            LU-6916

            jessica Jessica A. Popp (Inactive) added a comment - - edited Translating James' list to ticket numbers for tracking purposes: LU-6427 LU-6586 LU-6819 LU-6831 LU-6840 LU-6846 LU-6874 LU-6875 LU-6880 LU-6881 LU-6896 LU-6904 LU-6906 LU-6916
            simmonsja James A Simmons added a comment - For my DNE2 testing here is the list of patches I running against: http://review.whamcloud.com/#/c/14346 http://review.whamcloud.com/#/c/14747 http://review.whamcloud.com/#/c/15594 http://review.whamcloud.com/#/c/15720 http://review.whamcloud.com/#/c/15576 http://review.whamcloud.com/#/c/15730 http://review.whamcloud.com/#/c/15692 http://review.whamcloud.com/#/c/15691 http://review.whamcloud.com/#/c/15682 http://review.whamcloud.com/#/c/15690 http://review.whamcloud.com/#/c/15721 http://review.whamcloud.com/#/c/15724 http://review.whamcloud.com/#/c/15728 http://review.whamcloud.com/#/c/15770

            Yes LU-6831 helped with the revalidate FID bug.

            simmonsja James A Simmons added a comment - Yes LU-6831 helped with the revalidate FID bug.

            People

              di.wang Di Wang (Inactive)
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: