Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13634

racer test 1 failed with ‘Illegal instruction (core dumped)’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.5
    • None
    • ARM clients
    • 3
    • 9223372036854775807

    Description

      racer test_1 fails with the error message 'test_1 failed with 8'. Looking at the failure for an ARM client test, at https://testing.whamcloud.com/test_sets/e8f119a9-57ca-45e3-b8f3-246009fe1b75, in the suite_log we see many ‘illegal instructions’ errors like

      ./file_exec.sh: line 16: 25000 Illegal instruction     (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      racer cleanup
      sleeping 5 sec ...
      racer cleanup
      sleeping 5 sec ...
      Waited 5, rc=3 sleeping 10 sec ...
      Waited 5, rc=3 sleeping 10 sec ...
      Waited 20, rc=3 sleeping 20 sec ...
      Waited 20, rc=3 sleeping 20 sec ...
      Waited 50, rc=3 sleeping 40 sec ...
      Waited 50, rc=3 sleeping 40 sec ...
      Waited 110, rc=3 sleeping 80 sec ...
      Waited 110, rc=3 sleeping 80 sec ...
      Waited 230, rc=3 sleeping 160 sec ...
      Waited 230, rc=3 sleeping 160 sec ...
      Waited 470, rc=3 sleeping 320 sec ...
      Waited 470, rc=3 sleeping 320 sec ...
      Waited 950, rc=3 USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      Filesystem             1K-blocks   Used Available Use% Mounted on
      10.2.8.103@tcp:/lustre  15466208 221764  14268332   2% /mnt/lustre
      Waited 950, rc=2 USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      Filesystem             1K-blocks   Used Available Use% Mounted on
      10.2.8.103@tcp:/lustre  15466208 221764  14268332   2% /mnt/lustre
      Running /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds. CTRL-C to exit
      Running /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds. CTRL-C to exit
      layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr
      layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr
      layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr
      layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr
      layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr
      layout: raid0 raid0 pfl pfl pfl dom dom dom flr flr flr
      ./file_exec.sh: line 16: 29938 Illegal instruction     (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      racer cleanup
      ./file_exec.sh: line 16: 26172 Illegal instruction     (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      sleeping 5 sec ...
      racer cleanup
      …
      ./file_exec.sh: line 16: 22980 Illegal instruction     (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 16:  7946 Terminated              $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      sleeping 5 sec ...
      racer cleanup
      sleeping 5 sec ...
      Waited 5, rc=3 sleeping 10 sec ...
      Waited 5, rc=3 sleeping 10 sec ...
      Waited 20, rc=3 sleeping 20 sec ...
      Waited 20, rc=3 sleeping 20 sec ...
      Waited 50, rc=3 sleeping 40 sec ...
      Waited 50, rc=3 sleeping 40 sec ...
      Waited 110, rc=3 sleeping 80 sec ...
      Waited 110, rc=3 sleeping 80 sec ...
      Waited 230, rc=3 sleeping 160 sec ...
      Waited 230, rc=3 sleeping 160 sec ...
      Waited 470, rc=3 sleeping 320 sec ...
      Waited 470, rc=3 sleeping 320 sec ...
      Waited 950, rc=2 USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      Filesystem             1K-blocks   Used Available Use% Mounted on
      10.2.8.103@tcp:/lustre  15466208 221764  14268332   2% /mnt/lustre2
      Waited 950, rc=3 USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      Filesystem             1K-blocks   Used Available Use% Mounted on
      10.2.8.103@tcp:/lustre  15466208 221764  14268332   2% /mnt/lustre2
      pid=14004 rc=1
      pid=14006 rc=1
      pid=14007 rc=1
       racer test_1: @@@@@@ FAIL: test_1 failed with 8 
      

      We’ve seen these errors in at least three test sessions in the past four months starting with the Lustre version 2.12.4.61 only on the b2_12 branch starting, all during ARM client testing:
      https://testing.whamcloud.com/test_sets/849d0827-b6ca-46ec-b0be-4727c4bda504
      https://testing.whamcloud.com/test_sets/58789911-1a3e-4c3c-ba7f-d1e7216f0ead

      There is at least one earlier failure like this for 2.13.51.37 on 27 JAN 2020:
      https://testing.whamcloud.com/test_sets/a554c12e-41ca-11ea-9847-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: