Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.5
-
None
-
ARM clients
-
3
-
9223372036854775807
Description
racer test_1 fails with the error message 'test_1 failed with 8'. Looking at the failure for an ARM client test, at https://testing.whamcloud.com/test_sets/e8f119a9-57ca-45e3-b8f3-246009fe1b75, in the suite_log we see many ‘illegal instructions’ errors like
./file_exec.sh: line 16: 25000 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null racer cleanup sleeping 5 sec ... racer cleanup sleeping 5 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 950, rc=3 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre Waited 950, rc=2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre Running /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds. CTRL-C to exit Running /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds. CTRL-C to exit layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr ./file_exec.sh: line 16: 29938 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null racer cleanup ./file_exec.sh: line 16: 26172 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null sleeping 5 sec ... racer cleanup … ./file_exec.sh: line 16: 22980 Illegal instruction (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null ./file_exec.sh: line 16: 7946 Terminated $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null sleeping 5 sec ... racer cleanup sleeping 5 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 5, rc=3 sleeping 10 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 20, rc=3 sleeping 20 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 50, rc=3 sleeping 40 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 110, rc=3 sleeping 80 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 230, rc=3 sleeping 160 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 470, rc=3 sleeping 320 sec ... Waited 950, rc=2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre2 Waited 950, rc=3 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND Filesystem 1K-blocks Used Available Use% Mounted on 10.2.8.103@tcp:/lustre 15466208 221764 14268332 2% /mnt/lustre2 pid=14004 rc=1 pid=14006 rc=1 pid=14007 rc=1 racer test_1: @@@@@@ FAIL: test_1 failed with 8
We’ve seen these errors in at least three test sessions in the past four months starting with the Lustre version 2.12.4.61 only on the b2_12 branch starting, all during ARM client testing:
https://testing.whamcloud.com/test_sets/849d0827-b6ca-46ec-b0be-4727c4bda504
https://testing.whamcloud.com/test_sets/58789911-1a3e-4c3c-ba7f-d1e7216f0ead
There is at least one earlier failure like this for 2.13.51.37 on 27 JAN 2020:
https://testing.whamcloud.com/test_sets/a554c12e-41ca-11ea-9847-52540065bddc
Attachments
Issue Links
- mentioned in
-
Page Loading...