[LU-11540] racer test 1 fails with 'test_1 failed with 1' Created: 17/Oct/18 Updated: 10/Apr/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.10.5, Lustre 2.12.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | sles12 | ||
| Environment: |
SLES12 SP3 server and clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
racer test_1 fails. Looking at the client test_log at https://testing.whamcloud.com/test_sets/fb82b210-cf05-11e8-9238-52540065bddc , pid=8389 rc=0 pid=8390 rc=0 pid=8392 rc=254 pid=8395 rc=0 pid=8396 rc=0 pid=8398 rc=0 pid=8400 rc=0 pid=8404 rc=0 racer test_1: @@@@@@ FAIL: test_1 failed with 1 Looking at the racer test 1 code, 90 local rpids="" 91 for rdir in $RDIRS; do 92 do_nodes $clients "DURATION=$DURATION \ 93 MDSCOUNT=$MDSCOUNT OSTCOUNT=$OSTCOUNT\ 94 RACER_ENABLE_REMOTE_DIRS=$RACER_ENABLE_REMOTE_DIRS \ 95 RACER_ENABLE_STRIPED_DIRS=$RACER_ENABLE_STRIPED_DIRS \ 96 RACER_ENABLE_MIGRATION=$RACER_ENABLE_MIGRATION \ 97 RACER_ENABLE_PFL=$RACER_ENABLE_PFL \ 98 RACER_ENABLE_DOM=$RACER_ENABLE_DOM \ 99 RACER_ENABLE_FLR=$RACER_ENABLE_FLR \ 100 LFS=$LFS \ 101 $racer $rdir $NUM_RACER_THREADS" & 102 pid=$! 103 rpids="$rpids $pid" 104 done 105 … 118 119 echo racers pids: $rpids 120 for pid in $rpids; do 121 wait $pid 122 rc=$? 123 echo "pid=$pid rc=$rc" 124 if [ $rc != 0 ]; then 125 rrc=$((rrc + 1)) 126 fi 127 done 128 Looking at both the client console logs, we see a problem with fork and system-coredump [42276.361879] cgroup: fork rejected by pids controller in /system.slice/xinetd.service [42311.453149] LustreError: 28228:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x200000406:0x45:0x0] without object type: valid = 0x100000001 [42311.453157] LustreError: 28228:0:(llite_lib.c:2407:ll_prep_inode()) new_inode -fatal: rc -12 [42352.382819] Lustre: 19021:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1539417453/real 1539417453] req@ffff8800621803c0 x1614195941033888/t0(0) o36->lustre-MDT0000-mdc-ffff88007b01c800@10.2.8.135@tcp:12/10 lens 488/4528 e 0 to 1 dl 1539417497 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [42352.382831] Lustre: 19021:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message [42352.382849] Lustre: lustre-MDT0000-mdc-ffff88007b01c800: Connection to lustre-MDT0000 (at 10.2.8.135@tcp) was lost; in progress operations using this service will wait for recovery to complete [42352.389206] Lustre: lustre-MDT0000-mdc-ffff88007b01c800: Connection restored to 10.2.8.135@tcp (at 10.2.8.135@tcp) [42352.389212] Lustre: Skipped 1 previous similar message [42649.104856] Lustre: 7938:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1539417749/real 1539417749] req@ffff88004543c3c0 x1614195947256656/t0(0) o36->lustre-MDT0000-mdc-ffff88007b01c800@10.2.8.135@tcp:12/10 lens 488/4528 e 0 to 1 dl 1539417793 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [42649.104876] Lustre: lustre-MDT0000-mdc-ffff88007b01c800: Connection to lustre-MDT0000 (at 10.2.8.135@tcp) was lost; in progress operations using this service will wait for recovery to complete [42649.113726] Lustre: lustre-MDT0000-mdc-ffff88007b01c800: Connection restored to 10.2.8.135@tcp (at 10.2.8.135@tcp) [42718.880275] 9[21199]: segfault at 0 ip (null) sp 00007ffe335fa0d8 error 14 in 9[400000+7000] [42718.902756] systemd-coredump[21474]: Not enough arguments passed from kernel (0, expected 6). [42879.211671] 19[24911]: segfault at 8 ip 00007f777e099b50 sp 00007fffd06ec020 error 4 in ld-2.22.so[7f777e08d000+21000] [42879.251815] systemd-coredump[24974]: Not enough arguments passed from kernel (0, expected 6). [42880.639477] 19[26804]: segfault at 8 ip 00007f8cfa368418 sp 00007ffffd6e0ae0 error 4 in ld-2.22.so[7f8cfa35d000+21000] So far, this has only been seen when testing SLES12 SP3 servers and clients. Logs for more failed racer test suites are at Although the following has more failures, it looks like we’ve seen this in the b2_10 branch |