[LU-13634] racer test 1 failed with ‘Illegal instruction (core dumped)’ Created: 04/Jun/20 Updated: 10/Sep/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
ARM clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
racer test_1 fails with the error message 'test_1 failed with 8'. Looking at the failure for an ARM client test, at https://testing.whamcloud.com/test_sets/e8f119a9-57ca-45e3-b8f3-246009fe1b75, in the suite_log we see many ‘illegal instructions’ errors like ./file_exec.sh: line 16: 25000 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null racer cleanup sleeping 5 sec ... racer cleanup sleeping 5 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 950, rc=3 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre Waited 950, rc=2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre Running /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds. CTRL-C to exit Running /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds. CTRL-C to exit layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr ./file_exec.sh: line 16: 29938 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null racer cleanup ./file_exec.sh: line 16: 26172 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null sleeping 5 sec ... racer cleanup … ./file_exec.sh: line 16: 22980 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 16: 7946 Terminated $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null sleeping 5 sec ... racer cleanup sleeping 5 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 950, rc=2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre2 Waited 950, rc=3 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre2 pid=14004 rc=1 pid=14006 rc=1 pid=14007 rc=1 racer test_1: @@@@@@ FAIL: test_1 failed with 8 We’ve seen these errors in at least three test sessions in the past four months starting with the Lustre version 2.12.4.61 only on the b2_12 branch starting, all during ARM client testing: There is at least one earlier failure like this for 2.13.51.37 on 27 JAN 2020: |
| Comments |
| Comment by Andreas Dilger [ 05/Jun/20 ] |
|
AFAIK, this is fairly normal for racer, as it is doing crazy things to the files. The main goal is not to crash, and to be able to clean up afterward. We should see if syzbot |
| Comment by John Hammond [ 05/Jun/20 ] |
|
Two things here:
|
| Comment by Gerrit Updater [ 10/Sep/20 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/39878 |