[LU-6092] segment fault and bus error during racer Created: 08/Jan/15  Updated: 01/Apr/22  Resolved: 08/Jan/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Di Wang Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Blocker
is blocking LU-4712 racer test_1: oops at __d_lookup+0x8c Resolved
Duplicate
is duplicated by LU-6117 racer.sh : ./file_exec.sh: line 12: 2... Closed
Related
is related to LU-8903 Drop "Segmentation fault and Bus erro... Resolved
Severity: 3
Rank (Obsolete): 16956

 Description   

Though there are no kernel panic, but some bus error and segment fault happens during racer, even with single MDT on current master.

== racer test 1: racer on clients: testnode DURATION=300 == 23:16:46 (1420615006)
racers pids: 5216 5217
./file_exec.sh: line 12:  6093 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12:  8638 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 19954 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 29676 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 46760 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 51388 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 76169 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 96465 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 101751 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 103864 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 113629 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 118052 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 121051 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12:  8462 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 11135 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 11357 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~^[[20~^[[20~^[[20~^[[20~^[[20~^[[20~./file_exec.sh: line 12: 51034 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~./file_exec.sh: line 12: 60066 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 60784 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 68772 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 78361 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 96118 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 97719 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 102173 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 107360 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 116832 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 120715 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^C^[[20~./file_exec.sh: line 12: 122723 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~./file_exec.sh: line 12: 128525 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 17964 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 29243 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 44950 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 46846 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 50790 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 72941 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 84007 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 104448 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 107871 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 113919 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 12260 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 38650 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 39337 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 39855 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 48581 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 52954 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 57474 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 58930 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 65759 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 82054 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 94833 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~./file_exec.sh: line 12: 103710 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
^[[20~^[[20~^[[20~^[[20~^[[20~^C^[[20~^[[20~^[[20~./file_exec.sh: line 12:  8509 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 20436 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 35678 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 43675 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 43916 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 57146 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 62439 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 72712 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 75564 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 78896 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 85944 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null

^C./file_exec.sh: line 12: 112681 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 117736 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 130348 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 12633 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 22754 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 26036 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 26263 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 28503 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 31072 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 39816 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 51044 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 57162 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 57836 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 62113 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
file_create.sh: no process killed
dir_create.sh: no process killed
file_rm.sh: no process killed
file_create.sh: no process killed
file_rename.sh: no process killed
file_link.sh: no process killed
dir_create.sh: no process killed
file_rm.sh: no process killed
file_symlink.sh: no process killed
file_list.sh: no process killed
file_rename.sh: no process killed
file_concat.sh: no process killed
file_link.sh: no process killed
file_exec.sh: no process killed
file_symlink.sh: no process killed
file_list.sh: no process killed
file_chown.sh: no process killed
file_chmod.sh: no process killed
file_concat.sh: no process killed
file_exec.sh: no process killed
file_mknod.sh: no process killed
file_truncate.sh: no process killed
file_chown.sh: no process killed
file_delxattr.sh: no process killed
file_chmod.sh: no process killed
file_mknod.sh: no process killed
file_getxattr.sh: no process killed
file_truncate.sh: no process killed
file_setxattr.sh: no process killed
file_delxattr.sh: no process killed
file_getxattr.sh: no process killed
file_setxattr.sh: no process killed
Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
racer cleanup
sleeping 5 sec ...
there should be NO racer processes:
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Filesystem           1K-blocks  Used Available Use% Mounted on
testnode@tcp:/lustre    374928 54024    297708  16% /mnt/lustre2
We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
Running /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
racer cleanup
sleeping 5 sec ...
there should be NO racer processes:
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Filesystem           1K-blocks  Used Available Use% Mounted on
testnode@tcp:/lustre    374928 54024    297708  16% /mnt/lustre
We survived /work/lustre-release_new/lustre/tests/racer/racer.sh for 300 seconds.
test_1 returned 129
FAIL 1 (313s)

I ran racer with the following patch

[root@testnode tests]# git diff
diff --git a/lustre/tests/racer/file_create.sh b/lustre/tests/racer/file_create.sh
index e615365..62eb8bb 100755
--- a/lustre/tests/racer/file_create.sh
+++ b/lustre/tests/racer/file_create.sh
@@ -9,7 +9,7 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)}
 while /bin/true ; do 
        file=$((RANDOM % MAX))
        # $RANDOM is between 0 and 32767, and we want $blockcount in 64kB units
-       blockcount=$((RANDOM * MAX_MB / 32 / 64))
+       blockcount=$((RANDOM % 4))
        stripecount=$((RANDOM % (OSTCOUNT + 1)))
        [ $OSTCOUNT -gt 0 ] &&
                lfs setstripe -c $stripecount $DIR/$file 2> /dev/null
diff --git a/lustre/tests/racer/racer.sh b/lustre/tests/racer/racer.sh
index deef18e..3ed624e 100755
--- a/lustre/tests/racer/racer.sh
+++ b/lustre/tests/racer/racer.sh
@@ -17,7 +17,7 @@ file_list file_concat file_exec file_chown file_chmod file_mknod file_truncate \
 file_delxattr file_getxattr file_setxattr"
 
 if [ $MDSCOUNT -gt 1 ]; then
-       RACER_PROGS="${RACER_PROGS} dir_remote dir_migrate"
+       RACER_PROGS="${RACER_PROGS} dir_remote"
 fi
 
 racer_cleanup()



 Comments   
Comment by Di Wang [ 08/Jan/15 ]

[1/7/15, 6:45:28 PM] John Hammond: This is normal and correct.
[1/7/15, 6:45:59 PM] John Hammond: These executables have been truncated or overwritten.
[1/7/15, 6:46:30 PM] John Hammond: So we should not expect them to execute successfully.

Comment by Andreas Dilger [ 08/Jan/15 ]

John, shouldn't the open-exec of the executable prevent it from being truncated or overwritten? We've had bugs in the last where users overwrite executables on long running jobs (e.g. change cide and run make on login node), and they are sad when said long running job immediately crashes.

Comment by Di Wang [ 08/Jan/15 ]

[1/8/15, 12:31:43 PM] John Hammond: Yes chmod is allowed after open for execute.
[1/8/15, 12:35:56 PM] John Hammond: The SIGBUS does not necessarily indicate a lustre bug.
[1/8/15, 12:36:42 PM] John Hammond: q:~# cp /bin/sleep /tmp/
q:~# stat /tmp/sleep
File: `/tmp/sleep'
Size: 27848 Blocks: 56 IO Block: 4096 regular file
Device: fc01h/64513d Inode: 539096 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2015-01-08 14:36:05.682994846 -0600
Modify: 2015-01-08 14:36:05.682994846 -0600
Change: 2015-01-08 14:36:05.682994846 -0600
q:~# truncate --size=10000 /tmp/sleep
q:~# /tmp/sleep
Bus error (core dumped)

Generated at Sat Feb 10 01:57:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.