Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
3
-
16947
Description
This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/9ec8122a-9608-11e4-af28-5254006e85c2.
The sub-test test_1 failed with the following logs on the client console:
INFO: task dir_create.sh:5686 was blocked for more than 120s Call trace: mutex_lock+0x2b/0x50 do_lookup+0x11b/0x230 __link_path_walk+0x200/0x1000 path_walk+0x6a/0xe0 do_filp_open+0x1fa/0xd20 do_sys_open+0x69/0x140 sys_open+0x20/0x30
It looks like this is only being hit with both master client and master server (pre-2.7.0) so is very likely related to DNE striped directories and is a regression on master (possibly due to the addition of a new racer test for striped directories?). Combinations of 2.4/2.5/2.6/master client or server do not hit this problem.
It would be nice to get the LU-4712 patch http://review.whamcloud.com/9689 landed to clean up the DNE striped directory console messages, but this case doesn't have the client oops, just stuck threads.
Info required for matching: racer 1
Attachments
Issue Links
- is duplicated by
-
LU-5936 lmv_merge_attr() and callees ignore i_blocks
-
- Resolved
-
Activity
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13432/
Subject: LU-6088 lmv: Do not revalidate stripes with master lock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0860eda0544a507754b3af4aadcd651e1120ded5
With this patch and minor change for racer, racer can pass on my local test (4 MDTs, 2 OSTs).
diff --git a/lustre/tests/cfg/local.sh b/lustre/tests/cfg/local.sh index 6d16312..73c85a3 100644 --- a/lustre/tests/cfg/local.sh +++ b/lustre/tests/cfg/local.sh @@ -13,7 +13,7 @@ TMP=${TMP:-/tmp} DAEMONSIZE=${DAEMONSIZE:-500} MDSCOUNT=${MDSCOUNT:-1} MDSDEVBASE=${MDSDEVBASE:-$TMP/${FSNAME}-mdt} -MDSSIZE=${MDSSIZE:-200000} +MDSSIZE=${MDSSIZE:-2000000} # # Format options of facets can be specified with these variables: # @@ -39,7 +39,7 @@ MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-} OSTCOUNT=${OSTCOUNT:-2} OSTDEVBASE=${OSTDEVBASE:-$TMP/${FSNAME}-ost} -OSTSIZE=${OSTSIZE:-200000} +OSTSIZE=${OSTSIZE:-2000000} OSTOPT=${OSTOPT:-} OST_FS_MKFS_OPTS=${OST_FS_MKFS_OPTS:-} OST_MOUNT_OPTS=${OST_MOUNT_OPTS:-} diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh index deef18e..3ed624e 100755 --- a/lustre/tests/racer/racer.sh +++ b/lustre/tests/racer/racer.sh @@ -17,7 +17,7 @@ file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \ file_delxattr file_getxattr file_setxattr" if [ $MDSCOUNT -gt 1 ]; then - RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate" + RACER_PROGS="${RACER_PROGS} dir_remote" fi racer_cleanup()
== racer test 1: racer on clients: testnode DURATION=300 == 17:12:47 (1421284367) racers pids: 77271 77272 77273 77275 77278 77281 77285 77289 ./file_exec.sh: line 12: 86522 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 90169 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 95671 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 95382 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 98325 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 108967 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 115776 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 122252 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 122042 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 130326 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 12459 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 16796 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 36610 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 40410 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 41512 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 44574 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 54088 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 53171 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 55728 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 56052 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 58816 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 65907 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 68804 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 74136 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 73959 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 78464 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 93150 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 98385 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 102325 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 101154 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 104116 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 106750 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113145 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113147 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113766 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 115832 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 117893 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 122696 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 421 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 9260 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 11327 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 11681 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 17643 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 20928 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 24589 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 38846 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 49343 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 64789 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 64736 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 89863 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 103920 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 105115 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 107463 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 116236 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 116222 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 117338 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 125041 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 126624 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 5690 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 10494 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 21195 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 21185 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 28111 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 30436 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 37965 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 39235 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 42352 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 45837 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 46220 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 47790 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 52719 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 53661 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 53811 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 54609 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 55437 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 60614 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 65795 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 66303 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 71790 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 74574 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 80161 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 86993 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 86851 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 86691 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 90074 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 91221 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 88688 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 104944 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 110164 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 115085 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 121941 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null file_create.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_create.sh: no process killed file_create.sh: no process killed dir_create.sh: no process killed file_rename.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_link.sh: no process killed file_rm.sh: no process killed file_rename.sh: no process killed file_symlink.sh: no process killed file_create.sh: no process killed file_rename.sh: no process killed file_link.sh: no process killed file_create.sh: no process killed file_list.sh: no process killed file_create.sh: no process killed dir_create.sh: no process killed file_link.sh: no process killed file_symlink.sh: no process killed dir_create.sh: no process killed file_concat.sh: no process killed file_create.sh: no process killed file_symlink.sh: no process killed file_rm.sh: no process killed file_rm.sh: no process killed dir_create.sh: no process killed file_list.sh: no process killed file_create.sh: no process killed file_exec.sh: no process killed dir_create.sh: no process killed file_list.sh: no process killed file_rename.sh: no process killed file_concat.sh: no process killed file_rm.sh: no process killed file_rename.sh: no process killed file_chown.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_concat.sh: no process killed file_link.sh: no process killed file_exec.sh: no process killed file_rename.sh: no process killed file_link.sh: no process killed file_chmod.sh: no process killed file_rm.sh: no process killed file_rename.sh: no process killed file_exec.sh: no process killed file_symlink.sh: no process killed file_mknod.sh: no process killed file_symlink.sh: no process killed file_link.sh: no process killed file_rename.sh: no process killed file_chown.sh: no process killed file_link.sh: no process killed file_chown.sh: no process killed file_truncate.sh: no process killed file_list.sh: no process killed file_list.sh: no process killed file_symlink.sh: no process killed file_link.sh: no process killed file_symlink.sh: no process killed file_chmod.sh: no process killed file_delxattr.sh: no process killed file_chmod.sh: no process killed file_concat.sh: no process killed file_list.sh: no process killed file_list.sh: no process killed file_concat.sh: no process killed file_symlink.sh: no process killed file_getxattr.sh: no process killed file_mknod.sh: no process killed file_mknod.sh: no process killed file_exec.sh: no process killed file_concat.sh: no process killed file_concat.sh: no process killed file_list.sh: no process killed file_exec.sh: no process killed file_setxattr.sh: no process killed file_truncate.sh: no process killed file_truncate.sh: no process killed file_chown.sh: no process killed file_exec.sh: no process killed file_delxattr.sh: no process killed file_concat.sh: no process killed file_chown.sh: no process killed file_exec.sh: no process killed file_delxattr.sh: no process killed file_chown.sh: no process killed dir_remote.sh: no process killed file_chmod.sh: no process killed file_chmod.sh: no process killed file_exec.sh: no process killed file_getxattr.sh: no process killed file_getxattr.sh: no process killed file_chown.sh: no process killed file_chmod.sh: no process killed file_mknod.sh: no process killed file_setxattr.sh: no process killed file_mknod.sh: no process killed file_setxattr.sh: no process killed file_mknod.sh: no process killed file_chown.sh: no process killed file_chmod.sh: no process killed file_truncate.sh: no process killed dir_remote.sh: no process killed dir_remote.sh: no process killed file_truncate.sh: no process killed file_truncate.sh: no process killed file_chmod.sh: no process killed file_mknod.sh: no process killed file_delxattr.sh: no process killed file_delxattr.sh: no process killed file_delxattr.sh: no process killed file_getxattr.sh: no process killed file_mknod.sh: no process killed file_getxattr.sh: no process killed file_getxattr.sh: no process killed file_setxattr.sh: no process killed file_truncate.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. file_truncate.sh: no process killed file_setxattr.sh: no process killed file_setxattr.sh: no process killed file_delxattr.sh: no process killed dir_remote.sh: no process killed dir_remote.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. dir_remote.sh: no process killed file_delxattr.sh: no process killed file_getxattr.sh: no process killed file_getxattr.sh: no process killed file_setxattr.sh: no process killed file_setxattr.sh: no process killed dir_remote.sh: no process killed dir_remote.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. pid=77271 rc=0 Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. pid=77272 rc=0 Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 3777312 143096 3417764 5% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. pid=77273 rc=0 pid=77275 rc=0 pid=77278 rc=0 pid=77281 rc=0 pid=77285 rc=0 pid=77289 rc=0 Resetting fail_loc on all nodes...done. PASS 1 (306s) == racer test complete, duration 308 sec == 17:17:53 (1421284673) Stopping clients: testnode /mnt/lustre2 (opts:) Stopping client testnode /mnt/lustre2 opts: [root@testnode tests]# [root@testnode tests]# MDSCOUNT=4 sh llmountcleanup.sh Stopping clients: testnode /mnt/lustre (opts:-f) Stopping client testnode /mnt/lustre opts:-f Stopping clients: testnode /mnt/lustre2 (opts:-f) Stopping /mnt/mds1 (opts:-f) on testnode Stopping /mnt/mds2 (opts:-f) on testnode Stopping /mnt/mds3 (opts:-f) on testnode Stopping /mnt/mds4 (opts:-f) on testnode Stopping /mnt/ost1 (opts:-f) on testnode Stopping /mnt/ost2 (opts:-f) on testnode modules unloaded.
Note: I run racer with 8 cores and 8G memory.
wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/13432
Subject: LU-6088 lmv: Do not revalidate strips with master lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 715252971aa4839930492520a72eb50b4fa2936b
Hmm, this trace caused my attention,
ls S 0000000000000001 0 33694 1 0x00000080 ffff8801d21fd2f8 0000000000000082 ffff8801d21fd288 ffffffffa092ac0c ffffffffa0a0b460 ffff8801e5393000 ffff8801d21fd268 ffffffffa093d4f5 ffff8801d21fd2f8 ffffffffa0964af2 ffff8801e5ed25f8 ffff8801d21fdfd8 Call Trace: [<ffffffffa092ac0c>] ? ptlrpc_request_bufs_pack+0x5c/0x80 [ptlrpc] [<ffffffffa093d4f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [<ffffffffa0964af2>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc] [<ffffffffa0941d40>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [<ffffffffa06e8fe4>] obd_get_request_slot+0x1a4/0x280 [obdclass] [<ffffffff81064b90>] ? default_wake_function+0x0/0x20 [<ffffffffa0ba11a5>] mdc_enqueue+0x275/0x1a40 [mdc] [<ffffffffa0b9f25b>] ? mdc_lock_match+0xbb/0x170 [mdc] [<ffffffffa0ba2b52>] mdc_intent_lock+0x1e2/0x5f9 [mdc] [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa0912840>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] [<ffffffffa0b59b32>] lmv_revalidate_slaves+0x482/0x1130 [lmv] [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa0b40a7a>] lmv_update_lsm_md+0x1a/0x20 [lmv] [<ffffffffa11562da>] ll_update_inode+0x134a/0x1e60 [lustre] [<ffffffffa0b5c3d1>] ? lmv_fld_lookup+0xf1/0x440 [lmv] [<ffffffff8129456a>] ? strlcpy+0x4a/0x60 [<ffffffffa1156e78>] ll_read_inode2+0x88/0x470 [lustre] [<ffffffffa11720fb>] ll_iget+0x13b/0x3c0 [lustre] [<ffffffffa0b3e4b8>] ? lmv_get_lustre_md+0x88/0x300 [lmv] [<ffffffffa1164fe5>] ll_prep_inode+0x6c5/0xe80 [lustre] [<ffffffffa0929c4f>] ? ptlrpc_request_cache_free+0xbf/0x100 [ptlrpc] [<ffffffffa0b59064>] ? lmv_intent_remote+0x444/0xa90 [lmv] [<ffffffffa0941d40>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [<ffffffffa11755d1>] ll_lookup_it_finish+0x2f1/0x11b0 [lustre] [<ffffffff811749e3>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 [<ffffffffa1171ed9>] ? ll_i2suppgid+0x19/0x30 [lustre] [<ffffffffa115748c>] ? ll_prep_md_op_data+0x22c/0x530 [lustre] [<ffffffffa1174af0>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa1176737>] ll_lookup_it+0x2a7/0x9a0 [lustre] [<ffffffffa1176eb9>] ll_lookup_nd+0x89/0x5e0 [lustre] [<ffffffff8119e0f5>] do_lookup+0x1a5/0x230 [<ffffffff8119ed84>] __link_path_walk+0x7a4/0x1000 [<ffffffff8114f89f>] ? handle_pte_fault+0x4af/0xb00 [<ffffffff8119f89a>] path_walk+0x6a/0xe0 [<ffffffff8119faab>] filename_lookup+0x6b/0xc0 [<ffffffff8122db26>] ? security_file_alloc+0x16/0x20 [<ffffffff811a0f84>] do_filp_open+0x104/0xd20 [<ffffffff8129980a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811ae432>] ? alloc_fd+0x92/0x160 [<ffffffff8118b237>] do_sys_open+0x67/0x130 [<ffffffff8118b340>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Right now, when the slave lock is being revalidated (enqueue etc), we do not release the master lock, it is not probably not right. I will cook a patch.
Landed for 2.7