[LU-6092] segment fault and bus error during racer Created: 08/Jan/15 Updated: 01/Apr/22 Resolved: 08/Jan/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Di Wang | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 16956 | ||||||||||||||||||||||||
| Description |
|
Though there are no kernel panic, but some bus error and segment fault happens during racer, even with single MDT on current master. == racer test 1: racer on clients: testnode DURATION=300 == 23:16:46 (1420615006) racers pids: 5216 5217 ./file_exec.sh: line 12: 6093 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 8638 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 19954 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 29676 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 46760 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 51388 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 76169 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 96465 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 101751 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 103864 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113629 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 118052 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 121051 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 8462 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 11135 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 11357 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~^[[20~^[[20~^[[20~^[[20~^[[20~^[[20~./file_exec.sh: line 12: 51034 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~./file_exec.sh: line 12: 60066 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 60784 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 68772 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 78361 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 96118 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 97719 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 102173 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 107360 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 116832 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 120715 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^C^[[20~./file_exec.sh: line 12: 122723 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~./file_exec.sh: line 12: 128525 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 17964 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 29243 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 44950 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 46846 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 50790 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 72941 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 84007 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 104448 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 107871 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 113919 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 12260 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 38650 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 39337 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 39855 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 48581 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 52954 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 57474 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 58930 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 65759 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 82054 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 94833 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~./file_exec.sh: line 12: 103710 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^[[20~^[[20~^[[20~^[[20~^[[20~^C^[[20~^[[20~^[[20~./file_exec.sh: line 12: 8509 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 20436 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 35678 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 43675 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 43916 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 57146 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 62439 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 72712 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 75564 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 78896 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 85944 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ^C./file_exec.sh: line 12: 112681 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 117736 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 130348 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 12633 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 22754 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 26036 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 26263 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 28503 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 31072 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 39816 Segmentation fault (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 51044 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 57162 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 57836 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 12: 62113 Bus error (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null file_create.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_create.sh: no process killed file_rename.sh: no process killed file_link.sh: no process killed dir_create.sh: no process killed file_rm.sh: no process killed file_symlink.sh: no process killed file_list.sh: no process killed file_rename.sh: no process killed file_concat.sh: no process killed file_link.sh: no process killed file_exec.sh: no process killed file_symlink.sh: no process killed file_list.sh: no process killed file_chown.sh: no process killed file_chmod.sh: no process killed file_concat.sh: no process killed file_exec.sh: no process killed file_mknod.sh: no process killed file_truncate.sh: no process killed file_chown.sh: no process killed file_delxattr.sh: no process killed file_chmod.sh: no process killed file_mknod.sh: no process killed file_getxattr.sh: no process killed file_truncate.sh: no process killed file_setxattr.sh: no process killed file_delxattr.sh: no process killed file_getxattr.sh: no process killed file_setxattr.sh: no process killed Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 374928 54024 297708 16% /mnt/lustre2 We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit racer cleanup sleeping 5 sec ... there should be NO racer processes: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on testnode@tcp:/lustre 374928 54024 297708 16% /mnt/lustre We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. test_1 returned 129 FAIL 1 (313s) I ran racer with the following patch [root@testnode tests]# git diff
diff --git a/lustre/tests/racer/file_create.sh b/lustre/tests/racer/file_create.sh
index e615365..62eb8bb 100755
--- a/lustre/tests/racer/file_create.sh
+++ b/lustre/tests/racer/file_create.sh
@@ -9,7 +9,7 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)}
while /bin/true ; do
file=$((RANDOM % MAX))
# $RANDOM is between 0 and 32767, and we want $blockcount in 64kB units
- blockcount=$((RANDOM * MAX_MB / 32 / 64))
+ blockcount=$((RANDOM % 4))
stripecount=$((RANDOM % (OSTCOUNT + 1)))
[ $OSTCOUNT -gt 0 ] &&
lfs setstripe -c $stripecount $DIR/$file 2> /dev/null
diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh
index deef18e..3ed624e 100755
--- a/lustre/tests/racer/racer.sh
+++ b/lustre/tests/racer/racer.sh
@@ -17,7 +17,7 @@ file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \
file_delxattr file_getxattr file_setxattr"
if [ $MDSCOUNT -gt 1 ]; then
- RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate"
+ RACER_PROGS="${RACER_PROGS} dir_remote"
fi
racer_cleanup()
|
| Comments |
| Comment by Di Wang [ 08/Jan/15 ] |
|
[1/7/15, 6:45:28 PM] John Hammond: This is normal and correct. |
| Comment by Andreas Dilger [ 08/Jan/15 ] |
|
John, shouldn't the open-exec of the executable prevent it from being truncated or overwritten? We've had bugs in the last where users overwrite executables on long running jobs (e.g. change cide and run make on login node), and they are sad when said long running job immediately crashes. |
| Comment by Di Wang [ 08/Jan/15 ] |
|
[1/8/15, 12:31:43 PM] John Hammond: Yes chmod is allowed after open for execute. |