Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11219

racer test suite fails with no subtests failing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.12.1, Lustre 2.12.2, Lustre 2.12.3, Lustre 2.12.4
    • None
    • 3
    • 9223372036854775807

    Description

      In several test sessions, racer completes with no test failures, but the test suite fails.

      One recent example of this failure is at
      https://testing.whamcloud.com/test_sets/b7b5a9a4-9911-11e8-b0aa-52540065bddc

      If you look at the test_log, you can see that there is a failure in test-framework.sh

      We survived /usr/lib64/lustre/tests/racer/racer.sh for 900 seconds.
      pid=27203 rc=0
      /usr/lib64/lustre/tests/racer.sh: line 51:  5239 Terminated              $LUSTRE/tests/racer/lss_create.sh
      kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
        Trace dump:
        = /usr/lib64/lustre/tests/racer/lss_destroy.sh:1:main()
      racer: FAIL: test-framework exiting on error
      /usr/lib64/lustre/tests/racer.sh: line 116:  5241 Terminated              $LUSTRE/tests/racer/lss_destroy.sh
      Cleaning test environment ...
      

      In the client console logs (vm2), we see some errors

      /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
      [41111.677988] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000404:0x16:0x0], use llapi_layout_get_by_path()
      [41116.755137] LustreError: 24455:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0x59:0x0]: -16
      [41119.312286] 0[28282]: segfault at 8 ip 00007f770a3c1958 sp 00007ffe220c1600 error 4 in ld-2.17.so[7f770a3b6000+22000]
      [41126.240557] 16[15401]: segfault at 8 ip 00007f45e23a5958 sp 00007ffc9a44b300 error 4 in ld-2.17.so[7f45e239a000+22000]
      [41137.883169] Lustre: 24030:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1533454793/real 1533454793]  req@ffff9760df0ac600 x1607938745861680/t0(0) o36->lustre-MDT0000-mdc-ffff9760da4b0800@10.9.4.214@tcp:12/10 lens 608/33520 e 0 to 1 dl 1533454800 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [41137.886206] Lustre: lustre-MDT0000-mdc-ffff9760da4b0800: Connection to lustre-MDT0000 (at 10.9.4.214@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [41137.893377] Lustre: lustre-MDT0000-mdc-ffff9760da4b0800: Connection restored to 10.9.4.214@tcp (at 10.9.4.214@tcp)
      [41137.895228] Lustre: Skipped 1 previous similar message
      [41153.890232] 1[2862]: segfault at 8 ip 00007f8e5ca81958 sp 00007ffef3527a80 error 4 in ld-2.17.so[7f8e5ca76000+22000]
      [41382.588596] 13[20095]: segfault at 8 ip 00007f615b332958 sp 00007ffe28a69740 error 4 in ld-2.17.so[7f615b327000+22000]
      [41461.276914] LustreError: 10972:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0x831:0x0]: -16
      [41790.723927] Lustre: 23148:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1533455409/real 1533455409]  req@ffff9760e84a3c00 x1607938748994256/t0(0) o36->lustre-MDT0000-mdc-ffff9760ebc0d800@10.9.4.214@tcp:12/10 lens 608/33520 e 0 to 1 dl 1533455453 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [41790.726799] Lustre: lustre-MDT0000-mdc-ffff9760ebc0d800: Connection to lustre-MDT0000 (at 10.9.4.214@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [41790.733483] Lustre: lustre-MDT0000-mdc-ffff9760ebc0d800: Connection restored to 10.9.4.214@tcp (at 10.9.4.214@tcp)
      [41886.854201] 10[15730]: segfault at 8 ip 00007fb1ccb98958 sp 00007ffe20b24c50 error 4[41886.854226] 10[15008]: segfault at 8 ip 00007f91e0c82958 sp 00007ffc925a0670 error 4 in ld-2.17.so[7f91e0c77000+22000]
      
      [41886.856167]  in ld-2.17.so[7fb1ccb8d000+22000]
      [41890.860922] LustreError: 20803:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0xf89:0x0]: -16
      [41907.391169] 16[18590]: segfault at 8 ip 00007fb0cd822958 sp 00007ffe5c9d9a60 error 4 in ld-2.17.so[7fb0cd817000+22000]
      

      In the client console logs (vm1), we see some errors

      [41113.825182] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000402:0x50:0x0], use llapi_layout_get_by_path()
      [41119.485080] 13[7719]: segfault at 8 ip 00007fb755e1b958 sp 00007fffe348efb0 error 4 in ld-2.17.so[7fb755e10000+22000]
      [41222.166818] 15[9481]: segfault at 8 ip 00007fedc19f0958 sp 00007fff92668070 error 4 in ld-2.17.so[7fedc19e5000+22000]
      [41386.590121] 7[9668]: segfault at 8 ip 00007ff3a2be0958 sp 00007ffe4c639400 error 4 in ld-2.17.so[7ff3a2bd5000+22000]
      [41567.653415] 0[4503]: segfault at 8 ip 00007f18555e2958 sp 00007ffe555ae4a0 error 4 in ld-2.17.so[7f18555d7000+22000]
      [41623.843544] 4[29403]: segfault at 8 ip 00007fe31436a958 sp 00007fff42768840 error 4 in ld-2.17.so[7fe31435f000+22000]
      [41994.067094] 5[20666]: segfault at 0 ip 0000000000403e5f sp 00007ffc69d4a4e0 error 6 in 5[400000+6000]
      [42082.264466] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
      

      So far, this is only seen with ZFS testing.

      I think all these error messages have been reported in other racer test failure tickets.

      Here are other failures like this
      https://testing.whamcloud.com/test_sets/d3254156-9743-11e8-b0aa-52540065bddc
      https://testing.whamcloud.com/test_sets/eace6cec-961c-11e8-8ee3-52540065bddc
      https://testing.whamcloud.com/test_sets/688ba02e-90f9-11e8-87f3-52540065bddc
      https://testing.whamcloud.com/test_sets/3a37c110-860b-11e8-808e-52540065bddc
      https://testing.whamcloud.com/test_sets/df9509ec-6b99-11e8-a522-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: