Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6088

racer test_1: dir_create.sh mutex deadlock in sys_open->do_lookup

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.7.0
    • 3
    • 16947

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/9ec8122a-9608-11e4-af28-5254006e85c2.

      The sub-test test_1 failed with the following logs on the client console:

      INFO: task dir_create.sh:5686 was blocked for more than 120s
      Call trace:
      mutex_lock+0x2b/0x50
      do_lookup+0x11b/0x230
      __link_path_walk+0x200/0x1000
      path_walk+0x6a/0xe0
      do_filp_open+0x1fa/0xd20
      do_sys_open+0x69/0x140
      sys_open+0x20/0x30
      

      It looks like this is only being hit with both master client and master server (pre-2.7.0) so is very likely related to DNE striped directories and is a regression on master (possibly due to the addition of a new racer test for striped directories?). Combinations of 2.4/2.5/2.6/master client or server do not hit this problem.

      It would be nice to get the LU-4712 patch http://review.whamcloud.com/9689 landed to clean up the DNE striped directory console messages, but this case doesn't have the client oops, just stuck threads.

      Info required for matching: racer 1

      Attachments

        Issue Links

          Activity

            [LU-6088] racer test_1: dir_create.sh mutex deadlock in sys_open->do_lookup
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13432/
            Subject: LU-6088 lmv: Do not revalidate stripes with master lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0860eda0544a507754b3af4aadcd651e1120ded5

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13432/ Subject: LU-6088 lmv: Do not revalidate stripes with master lock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0860eda0544a507754b3af4aadcd651e1120ded5
            di.wang Di Wang added a comment - - edited

            With this patch and minor change for racer, racer can pass on my local test (4 MDTs, 2 OSTs).

            diff --git a/lustre/tests/cfg/local.sh b/lustre/tests/cfg/local.sh
            index 6d16312..73c85a3 100644
            --- a/lustre/tests/cfg/local.sh
            +++ b/lustre/tests/cfg/local.sh
            @@ -13,7 +13,7 @@ TMP=${TMP:-/tmp}
             DAEMONSIZE=${DAEMONSIZE:-500}
             MDSCOUNT=${MDSCOUNT:-1}
             MDSDEVBASE=${MDSDEVBASE:-$TMP/${FSNAME}-mdt}
            -MDSSIZE=${MDSSIZE:-200000}
            +MDSSIZE=${MDSSIZE:-2000000}
             #
             # Format options of facets can be specified with these variables:
             #
            @@ -39,7 +39,7 @@ MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-}
            
             OSTCOUNT=${OSTCOUNT:-2}
             OSTDEVBASE=${OSTDEVBASE:-$TMP/${FSNAME}-ost}
            -OSTSIZE=${OSTSIZE:-200000}
            +OSTSIZE=${OSTSIZE:-2000000}
             OSTOPT=${OSTOPT:-}
             OST_FS_MKFS_OPTS=${OST_FS_MKFS_OPTS:-}
             OST_MOUNT_OPTS=${OST_MOUNT_OPTS:-}
            diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh
            index deef18e..3ed624e 100755
            --- a/lustre/tests/racer/racer.sh
            +++ b/lustre/tests/racer/racer.sh
            @@ -17,7 +17,7 @@ file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \
             file_delxattr file_getxattr file_setxattr"
            
             if [ $MDSCOUNT -gt 1 ]; then
            -       RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate"
            +       RACER_PROGS="${RACER_PROGS} dir_remote"
             fi
            
             racer_cleanup()
            
            == racer test 1: racer on clients: testnode DURATION=300 == 17:12:47 (1421284367)
            racers pids: 77271 77272 77273 77275 77278 77281 77285 77289
            ./file_exec.sh: line 12: 86522 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 90169 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 95671 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 95382 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 98325 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 108967 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 115776 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 122252 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 122042 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 130326 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 12459 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 16796 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 36610 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 40410 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 41512 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 44574 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 54088 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 53171 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 55728 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 56052 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 58816 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 65907 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 68804 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 74136 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 73959 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 78464 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 93150 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 98385 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 102325 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 101154 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 104116 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 106750 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 113145 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 113147 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 113766 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 115832 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 117893 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 122696 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12:   421 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12:  9260 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 11327 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 11681 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 17643 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 20928 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 24589 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 38846 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 49343 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 64789 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 64736 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 89863 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 103920 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 105115 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 107463 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 116236 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 116222 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 117338 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 125041 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 126624 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            
            ./file_exec.sh: line 12:  5690 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 10494 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 21195 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 21185 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 28111 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 30436 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 37965 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 39235 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 42352 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 45837 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 46220 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 47790 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 52719 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 53661 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 53811 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 54609 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 55437 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 60614 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 65795 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 66303 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 71790 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 74574 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 80161 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 86993 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 86851 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 86691 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 90074 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 91221 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 88688 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 104944 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 110164 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 115085 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            ./file_exec.sh: line 12: 121941 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
            file_create.sh: no process killed
            dir_create.sh: no process killed
            file_rm.sh: no process killed
            file_create.sh: no process killed
            file_create.sh: no process killed
            dir_create.sh: no process killed
            file_rename.sh: no process killed
            dir_create.sh: no process killed
            file_rm.sh: no process killed
            file_link.sh: no process killed
            file_rm.sh: no process killed
            file_rename.sh: no process killed
            file_symlink.sh: no process killed
            file_create.sh: no process killed
            file_rename.sh: no process killed
            file_link.sh: no process killed
            file_create.sh: no process killed
            file_list.sh: no process killed
            file_create.sh: no process killed
            dir_create.sh: no process killed
            file_link.sh: no process killed
            file_symlink.sh: no process killed
            dir_create.sh: no process killed
            file_concat.sh: no process killed
            file_create.sh: no process killed
            file_symlink.sh: no process killed
            file_rm.sh: no process killed
            file_rm.sh: no process killed
            dir_create.sh: no process killed
            file_list.sh: no process killed
            file_create.sh: no process killed
            file_exec.sh: no process killed
            dir_create.sh: no process killed
            file_list.sh: no process killed
            file_rename.sh: no process killed
            file_concat.sh: no process killed
            file_rm.sh: no process killed
            file_rename.sh: no process killed
            file_chown.sh: no process killed
            dir_create.sh: no process killed
            file_rm.sh: no process killed
            file_concat.sh: no process killed
            file_link.sh: no process killed
            file_exec.sh: no process killed
            file_rename.sh: no process killed
            file_link.sh: no process killed
            file_chmod.sh: no process killed
            file_rm.sh: no process killed
            file_rename.sh: no process killed
            file_exec.sh: no process killed
            file_symlink.sh: no process killed
            file_mknod.sh: no process killed
            file_symlink.sh: no process killed
            file_link.sh: no process killed
            file_rename.sh: no process killed
            file_chown.sh: no process killed
            file_link.sh: no process killed
            file_chown.sh: no process killed
            file_truncate.sh: no process killed
            file_list.sh: no process killed
            file_list.sh: no process killed
            file_symlink.sh: no process killed
            file_link.sh: no process killed
            file_symlink.sh: no process killed
            file_chmod.sh: no process killed
            file_delxattr.sh: no process killed
            file_chmod.sh: no process killed
            file_concat.sh: no process killed
            file_list.sh: no process killed
            file_list.sh: no process killed
            file_concat.sh: no process killed
            file_symlink.sh: no process killed
            file_getxattr.sh: no process killed
            file_mknod.sh: no process killed
            file_mknod.sh: no process killed
            file_exec.sh: no process killed
            file_concat.sh: no process killed
            file_concat.sh: no process killed
            file_list.sh: no process killed
            file_exec.sh: no process killed
            file_setxattr.sh: no process killed
            file_truncate.sh: no process killed
            file_truncate.sh: no process killed
            file_chown.sh: no process killed
            file_exec.sh: no process killed
            file_delxattr.sh: no process killed
            file_concat.sh: no process killed
            file_chown.sh: no process killed
            file_exec.sh: no process killed
            file_delxattr.sh: no process killed
            file_chown.sh: no process killed
            dir_remote.sh: no process killed
            file_chmod.sh: no process killed
            file_chmod.sh: no process killed
            file_exec.sh: no process killed
            file_getxattr.sh: no process killed
            file_getxattr.sh: no process killed
            file_chown.sh: no process killed
            file_chmod.sh: no process killed
            file_mknod.sh: no process killed
            file_setxattr.sh: no process killed
            file_mknod.sh: no process killed
            file_setxattr.sh: no process killed
            file_mknod.sh: no process killed
            file_chown.sh: no process killed
            file_chmod.sh: no process killed
            file_truncate.sh: no process killed
            dir_remote.sh: no process killed
            dir_remote.sh: no process killed
            file_truncate.sh: no process killed
            file_truncate.sh: no process killed
            file_chmod.sh: no process killed
            file_mknod.sh: no process killed
            file_delxattr.sh: no process killed
            file_delxattr.sh: no process killed
            file_delxattr.sh: no process killed
            file_getxattr.sh: no process killed
            file_mknod.sh: no process killed
            file_getxattr.sh: no process killed
            file_getxattr.sh: no process killed
            file_setxattr.sh: no process killed
            file_truncate.sh: no process killed
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre2
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre2
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            file_truncate.sh: no process killed
            file_setxattr.sh: no process killed
            file_setxattr.sh: no process killed
            file_delxattr.sh: no process killed
            dir_remote.sh: no process killed
            dir_remote.sh: no process killed
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre2
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            dir_remote.sh: no process killed
            file_delxattr.sh: no process killed
            file_getxattr.sh: no process killed
            file_getxattr.sh: no process killed
            file_setxattr.sh: no process killed
            file_setxattr.sh: no process killed
            dir_remote.sh: no process killed
            dir_remote.sh: no process killed
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            pid=77271 rc=0
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            pid=77272 rc=0
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre2
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
            racer cleanup
            sleeping 5 sec ...
            there should be NO racer processes:
            USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
            Filesystem           1K-blocks   Used Available Use% Mounted on
            testnode@tcp:/lustre   3777312 143096   3417764   5% /mnt/lustre
            We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
            pid=77273 rc=0
            pid=77275 rc=0
            pid=77278 rc=0
            pid=77281 rc=0
            pid=77285 rc=0
            pid=77289 rc=0
            Resetting fail_loc on all nodes...done.
            PASS 1 (306s)
            == racer test complete, duration 308 sec == 17:17:53 (1421284673)
            Stopping clients: testnode /mnt/lustre2 (opts:)
            Stopping client testnode /mnt/lustre2 opts:
            [root@testnode tests]# 
            [root@testnode tests]# MDSCOUNT=4 sh llmountcleanup.sh 
            Stopping clients: testnode /mnt/lustre (opts:-f)
            Stopping client testnode /mnt/lustre opts:-f
            Stopping clients: testnode /mnt/lustre2 (opts:-f)
            Stopping /mnt/mds1 (opts:-f) on testnode
            Stopping /mnt/mds2 (opts:-f) on testnode
            Stopping /mnt/mds3 (opts:-f) on testnode
            Stopping /mnt/mds4 (opts:-f) on testnode
            Stopping /mnt/ost1 (opts:-f) on testnode
            Stopping /mnt/ost2 (opts:-f) on testnode
            modules unloaded.
            

            Note: I run racer with 8 cores and 8G memory.

            di.wang Di Wang added a comment - - edited With this patch and minor change for racer, racer can pass on my local test (4 MDTs, 2 OSTs). diff --git a/lustre/tests/cfg/local.sh b/lustre/tests/cfg/local.sh index 6d16312..73c85a3 100644 --- a/lustre/tests/cfg/local.sh +++ b/lustre/tests/cfg/local.sh @@ -13,7 +13,7 @@ TMP=${TMP:-/tmp} DAEMONSIZE=${DAEMONSIZE:-500} MDSCOUNT=${MDSCOUNT:-1} MDSDEVBASE=${MDSDEVBASE:-$TMP/${FSNAME}-mdt} -MDSSIZE=${MDSSIZE:-200000} +MDSSIZE=${MDSSIZE:-2000000} # # Format options of facets can be specified with these variables: # @@ -39,7 +39,7 @@ MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-} OSTCOUNT=${OSTCOUNT:-2} OSTDEVBASE=${OSTDEVBASE:-$TMP/${FSNAME}-ost} -OSTSIZE=${OSTSIZE:-200000} +OSTSIZE=${OSTSIZE:-2000000} OSTOPT=${OSTOPT:-} OST_FS_MKFS_OPTS=${OST_FS_MKFS_OPTS:-} OST_MOUNT_OPTS=${OST_MOUNT_OPTS:-} diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh index deef18e..3ed624e 100755 --- a/lustre/tests/racer/racer.sh +++ b/lustre/tests/racer/racer.sh @@ -17,7 +17,7 @@ file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \ file_delxattr file_getxattr file_setxattr" if [ $MDSCOUNT -gt 1 ]; then - RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate" + RACER_PROGS="${RACER_PROGS} dir_remote" fi racer_cleanup() == racer test 1: racer on clients: testnode DURATION=300 == 17:12:47 (1421284367) racers pids: 77271 77272 77273 77275 77278 77281 77285 77289 ./file_exec.sh: line 12: 86522 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 90169 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 95671 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 95382 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 98325 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 108967 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 115776 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 122252 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 122042 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 130326 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 12459 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 16796 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 36610 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 40410 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 41512 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 44574 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 54088 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 53171 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 55728 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 56052 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 58816 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 65907 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 68804 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 74136 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 73959 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 78464 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 93150 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 98385 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 102325 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 101154 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 104116 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 106750 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113145 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113147 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113766 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 115832 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 117893 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 122696 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 421 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 9260 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 11327 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 11681 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 17643 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 20928 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 24589 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 38846 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 49343 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 64789 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 64736 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 89863 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 103920 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 105115 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 107463 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 116236 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 116222 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 117338 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 125041 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 126624 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 5690 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 10494 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 21195 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 21185 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 28111 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 30436 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 37965 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 39235 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 42352 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 45837 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 46220 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 47790 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 52719 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 53661 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 53811 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 54609 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 55437 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 60614 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 65795 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 66303 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 71790 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 74574 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 80161 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 86993 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 86851 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 86691 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 90074 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 91221 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 88688 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 104944 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 110164 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 115085 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 121941 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null file_create.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_create.sh: no process killed file_create.sh: no process killed dir_create.sh: no process killed file_rename.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_link.sh: no process killed file_rm.sh: no process killed file_rename.sh: no process killed file_symlink.sh: no process killed file_create.sh: no process killed file_rename.sh: no process killed file_link.sh: no process killed file_create.sh: no process killed file_list.sh: no process killed file_create.sh: no process killed dir_create.sh: no process killed file_link.sh: no process killed file_symlink.sh: no process killed dir_create.sh: no process killed file_concat.sh: no process killed file_create.sh: no process killed file_symlink.sh: no process killed file_rm.sh: no process killed file_rm.sh: no process killed dir_create.sh: no process killed file_list.sh: no process killed file_create.sh: no process killed file_exec.sh: no process killed dir_create.sh: no process killed file_list.sh: no process killed file_rename.sh: no process killed file_concat.sh: no process killed file_rm.sh: no process killed file_rename.sh: no process killed file_chown.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_concat.sh: no process killed file_link.sh: no process killed file_exec.sh: no process killed file_rename.sh: no process killed file_link.sh: no process killed file_chmod.sh: no process killed file_rm.sh: no process killed file_rename.sh: no process killed file_exec.sh: no process killed file_symlink.sh: no process killed file_mknod.sh: no process killed file_symlink.sh: no process killed file_link.sh: no process killed file_rename.sh: no process killed file_chown.sh: no process killed file_link.sh: no process killed file_chown.sh: no process killed file_truncate.sh: no process killed file_list.sh: no process killed file_list.sh: no process killed file_symlink.sh: no process killed file_link.sh: no process killed file_symlink.sh: no process killed file_chmod.sh: no process killed file_delxattr.sh: no process killed file_chmod.sh: no process killed file_concat.sh: no process killed file_list.sh: no process killed file_list.sh: no process killed file_concat.sh: no process killed file_symlink.sh: no process killed file_getxattr.sh: no process killed file_mknod.sh: no process killed file_mknod.sh: no process killed file_exec.sh: no process killed file_concat.sh: no process killed file_concat.sh: no process killed file_list.sh: no process killed file_exec.sh: no process killed file_setxattr.sh: no process killed file_truncate.sh: no process killed file_truncate.sh: no process killed file_chown.sh: no process killed file_exec.sh: no process killed file_delxattr.sh: no process killed file_concat.sh: no process killed file_chown.sh: no process killed file_exec.sh: no process killed file_delxattr.sh: no process killed file_chown.sh: no process killed dir_remote.sh: no process killed file_chmod.sh: no process killed file_chmod.sh: no process killed file_exec.sh: no process killed file_getxattr.sh: no process killed file_getxattr.sh: no process killed file_chown.sh: no process killed file_chmod.sh: no process killed file_mknod.sh: no process killed file_setxattr.sh: no process killed file_mknod.sh: no process killed file_setxattr.sh: no process killed file_mknod.sh: no process killed file_chown.sh: no process killed file_chmod.sh: no process killed file_truncate.sh: no process killed dir_remote.sh: no process killed dir_remote.sh: no process killed file_truncate.sh: no process killed file_truncate.sh: no process killed file_chmod.sh: no process killed file_mknod.sh: no process killed file_delxattr.sh: no process killed file_delxattr.sh: no process killed file_delxattr.sh: no process killed file_getxattr.sh: no process killed file_mknod.sh: no process killed file_getxattr.sh: no process killed file_getxattr.sh: no process killed file_setxattr.sh: no process killed file_truncate.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. file_truncate.sh: no process killed file_setxattr.sh: no process killed file_setxattr.sh: no process killed file_delxattr.sh: no process killed dir_remote.sh: no process killed dir_remote.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. dir_remote.sh: no process killed file_delxattr.sh: no process killed file_getxattr.sh: no process killed file_getxattr.sh: no process killed file_setxattr.sh: no process killed file_setxattr.sh: no process killed dir_remote.sh: no process killed dir_remote.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. pid=77271 rc=0 Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. pid=77272 rc=0 Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. pid=77273 rc=0 pid=77275 rc=0 pid=77278 rc=0 pid=77281 rc=0 pid=77285 rc=0 pid=77289 rc=0 Resetting fail_loc on all nodes...done. PASS 1 (306s) == racer test complete, duration 308 sec == 17:17:53 (1421284673) Stopping clients: testnode /mnt/lustre2 (opts:) Stopping client testnode /mnt/lustre2 opts: [root@testnode tests]# [root@testnode tests]# MDSCOUNT=4 sh llmountcleanup.sh Stopping clients: testnode /mnt/lustre (opts:-f) Stopping client testnode /mnt/lustre opts:-f Stopping clients: testnode /mnt/lustre2 (opts:-f) Stopping /mnt/mds1 (opts:-f) on testnode Stopping /mnt/mds2 (opts:-f) on testnode Stopping /mnt/mds3 (opts:-f) on testnode Stopping /mnt/mds4 (opts:-f) on testnode Stopping /mnt/ost1 (opts:-f) on testnode Stopping /mnt/ost2 (opts:-f) on testnode modules unloaded. Note: I run racer with 8 cores and 8G memory.

            wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/13432
            Subject: LU-6088 lmv: Do not revalidate strips with master lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 715252971aa4839930492520a72eb50b4fa2936b

            gerrit Gerrit Updater added a comment - wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/13432 Subject: LU-6088 lmv: Do not revalidate strips with master lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 715252971aa4839930492520a72eb50b4fa2936b
            di.wang Di Wang added a comment -

            Hmm, this trace caused my attention,

            ls            S 0000000000000001     0 33694      1 0x00000080
             ffff8801d21fd2f8 0000000000000082 ffff8801d21fd288 ffffffffa092ac0c
             ffffffffa0a0b460 ffff8801e5393000 ffff8801d21fd268 ffffffffa093d4f5
             ffff8801d21fd2f8 ffffffffa0964af2 ffff8801e5ed25f8 ffff8801d21fdfd8
            Call Trace:
             [<ffffffffa092ac0c>] ? ptlrpc_request_bufs_pack+0x5c/0x80 [ptlrpc]
             [<ffffffffa093d4f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
             [<ffffffffa0964af2>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc]
             [<ffffffffa0941d40>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc]
             [<ffffffffa06e8fe4>] obd_get_request_slot+0x1a4/0x280 [obdclass]
             [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
             [<ffffffffa0ba11a5>] mdc_enqueue+0x275/0x1a40 [mdc]
             [<ffffffffa0b9f25b>] ? mdc_lock_match+0xbb/0x170 [mdc]
             [<ffffffffa0ba2b52>] mdc_intent_lock+0x1e2/0x5f9 [mdc]
             [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre]
             [<ffffffffa0912840>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
             [<ffffffffa0b59b32>] lmv_revalidate_slaves+0x482/0x1130 [lmv]
             [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre]
             [<ffffffffa0b40a7a>] lmv_update_lsm_md+0x1a/0x20 [lmv]
             [<ffffffffa11562da>] ll_update_inode+0x134a/0x1e60 [lustre]
             [<ffffffffa0b5c3d1>] ? lmv_fld_lookup+0xf1/0x440 [lmv]
             [<ffffffff8129456a>] ? strlcpy+0x4a/0x60
             [<ffffffffa1156e78>] ll_read_inode2+0x88/0x470 [lustre]
             [<ffffffffa11720fb>] ll_iget+0x13b/0x3c0 [lustre]
             [<ffffffffa0b3e4b8>] ? lmv_get_lustre_md+0x88/0x300 [lmv]
             [<ffffffffa1164fe5>] ll_prep_inode+0x6c5/0xe80 [lustre]
             [<ffffffffa0929c4f>] ? ptlrpc_request_cache_free+0xbf/0x100 [ptlrpc]
             [<ffffffffa0b59064>] ? lmv_intent_remote+0x444/0xa90 [lmv]
             [<ffffffffa0941d40>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc]
             [<ffffffffa11755d1>] ll_lookup_it_finish+0x2f1/0x11b0 [lustre]
             [<ffffffff811749e3>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
             [<ffffffffa1171ed9>] ? ll_i2suppgid+0x19/0x30 [lustre]
             [<ffffffffa115748c>] ? ll_prep_md_op_data+0x22c/0x530 [lustre]
             [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre]
             [<ffffffffa1176737>] ll_lookup_it+0x2a7/0x9a0 [lustre]
             [<ffffffffa1176eb9>] ll_lookup_nd+0x89/0x5e0 [lustre]
             [<ffffffff8119e0f5>] do_lookup+0x1a5/0x230
             [<ffffffff8119ed84>] __link_path_walk+0x7a4/0x1000
             [<ffffffff8114f89f>] ? handle_pte_fault+0x4af/0xb00
             [<ffffffff8119f89a>] path_walk+0x6a/0xe0
             [<ffffffff8119faab>] filename_lookup+0x6b/0xc0
             [<ffffffff8122db26>] ? security_file_alloc+0x16/0x20
             [<ffffffff811a0f84>] do_filp_open+0x104/0xd20
             [<ffffffff8129980a>] ? strncpy_from_user+0x4a/0x90
             [<ffffffff811ae432>] ? alloc_fd+0x92/0x160
             [<ffffffff8118b237>] do_sys_open+0x67/0x130
             [<ffffffff8118b340>] sys_open+0x20/0x30
             [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            

            Right now, when the slave lock is being revalidated (enqueue etc), we do not release the master lock, it is not probably not right. I will cook a patch.

            di.wang Di Wang added a comment - Hmm, this trace caused my attention, ls S 0000000000000001 0 33694 1 0x00000080 ffff8801d21fd2f8 0000000000000082 ffff8801d21fd288 ffffffffa092ac0c ffffffffa0a0b460 ffff8801e5393000 ffff8801d21fd268 ffffffffa093d4f5 ffff8801d21fd2f8 ffffffffa0964af2 ffff8801e5ed25f8 ffff8801d21fdfd8 Call Trace: [<ffffffffa092ac0c>] ? ptlrpc_request_bufs_pack+0x5c/0x80 [ptlrpc] [<ffffffffa093d4f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [<ffffffffa0964af2>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc] [<ffffffffa0941d40>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [<ffffffffa06e8fe4>] obd_get_request_slot+0x1a4/0x280 [obdclass] [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 [<ffffffffa0ba11a5>] mdc_enqueue+0x275/0x1a40 [mdc] [<ffffffffa0b9f25b>] ? mdc_lock_match+0xbb/0x170 [mdc] [<ffffffffa0ba2b52>] mdc_intent_lock+0x1e2/0x5f9 [mdc] [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa0912840>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] [<ffffffffa0b59b32>] lmv_revalidate_slaves+0x482/0x1130 [lmv] [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa0b40a7a>] lmv_update_lsm_md+0x1a/0x20 [lmv] [<ffffffffa11562da>] ll_update_inode+0x134a/0x1e60 [lustre] [<ffffffffa0b5c3d1>] ? lmv_fld_lookup+0xf1/0x440 [lmv] [<ffffffff8129456a>] ? strlcpy+0x4a/0x60 [<ffffffffa1156e78>] ll_read_inode2+0x88/0x470 [lustre] [<ffffffffa11720fb>] ll_iget+0x13b/0x3c0 [lustre] [<ffffffffa0b3e4b8>] ? lmv_get_lustre_md+0x88/0x300 [lmv] [<ffffffffa1164fe5>] ll_prep_inode+0x6c5/0xe80 [lustre] [<ffffffffa0929c4f>] ? ptlrpc_request_cache_free+0xbf/0x100 [ptlrpc] [<ffffffffa0b59064>] ? lmv_intent_remote+0x444/0xa90 [lmv] [<ffffffffa0941d40>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [<ffffffffa11755d1>] ll_lookup_it_finish+0x2f1/0x11b0 [lustre] [<ffffffff811749e3>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 [<ffffffffa1171ed9>] ? ll_i2suppgid+0x19/0x30 [lustre] [<ffffffffa115748c>] ? ll_prep_md_op_data+0x22c/0x530 [lustre] [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa1176737>] ll_lookup_it+0x2a7/0x9a0 [lustre] [<ffffffffa1176eb9>] ll_lookup_nd+0x89/0x5e0 [lustre] [<ffffffff8119e0f5>] do_lookup+0x1a5/0x230 [<ffffffff8119ed84>] __link_path_walk+0x7a4/0x1000 [<ffffffff8114f89f>] ? handle_pte_fault+0x4af/0xb00 [<ffffffff8119f89a>] path_walk+0x6a/0xe0 [<ffffffff8119faab>] filename_lookup+0x6b/0xc0 [<ffffffff8122db26>] ? security_file_alloc+0x16/0x20 [<ffffffff811a0f84>] do_filp_open+0x104/0xd20 [<ffffffff8129980a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811ae432>] ? alloc_fd+0x92/0x160 [<ffffffff8118b237>] do_sys_open+0x67/0x130 [<ffffffff8118b340>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Right now, when the slave lock is being revalidated (enqueue etc), we do not release the master lock, it is not probably not right. I will cook a patch.

            The same stack trace has been seen in LU-6085

            jay Jinshan Xiong (Inactive) added a comment - The same stack trace has been seen in LU-6085

            People

              di.wang Di Wang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: