[LU-16658] performance sanity test_6 mdsrate-lookup-10dirs - UCX ERROR Created: 22/Mar/23  Updated: 20/Jan/24  Resolved: 20/Jan/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: failing_tests

Issue Links:
Cloners
Related
is related to LU-16270 performance-sanity test_6: mkdir: can... Open
is related to LU-3786 performance-sanity test_6: mkdir hung... Resolved
is related to LU-16729 replace mdsrate with mdtest in Lustre... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/d2a273de-f0d8-426b-adb8-ccd3fed94347

Client appears to hand/drop connection while doing initial file/dir creation

===== mdsrate-lookup-10dirs.sh Test preparation: creating 10 dirs with 12650 files.
+ /usr/lib64/openmpi/bin/mdsrate --mknod --ndirs 10 --dirfmt '/mnt/lustre/mdsrate/lookup-%d' --nfiles 12650 --filefmt 'f%%d'
+ chmod 0777 /mnt/lustre
drwxrwxrwx 4 root root 4096 Jul 19 19:11 /mnt/lustre
+ su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe --oversubscribe -machinefile /tmp/auster.machines -np 10 /usr/lib64/openmpi/bin/mdsrate --mknod --ndirs 10 --dirfmt '/mnt/lustre/mdsrate/lookup-%d' --nfiles 12650 --filefmt 'f%%d' "
0: onyx-91vm11.onyx.whamcloud.com starting at Tue Jul 19 19:11:39 2022
[1658257899.971284] [onyx-91vm12:60266:0]          select.c:514  UCX  ERROR   no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory0 - Destination is unreachable, tcp/eth0 - Destination is unreachable, tcp/lo - Destination is unreachable

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
performance-sanity test_6 - Timeout occurred after 610 minutes, last suite running was performance-sanity



 Comments   
Comment by Gerrit Updater [ 23/Mar/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50386
Subject: LU-16658 tests: disable performance-sanity test_6
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c4b23f1975e750e8c8cbe64741b8427d708cb7a6

Comment by Gerrit Updater [ 27/Mar/23 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50425
Subject: LU-16658 tests: disable performance-sanity test_6
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 5a12022bf71e8c9d7e88eea4843ae73dc004840d

Comment by Gerrit Updater [ 04/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50386/
Subject: LU-16658 tests: disable performance-sanity test_6
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5e4897eb6f1c97d4f0120803780904db49c5abe7

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50425/
Subject: LU-16658 tests: disable performance-sanity test_6
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 254ee03c017c3457f3f064152ade8695100780ee

Comment by Andreas Dilger [ 20/Jan/24 ]

This has been fixed by the change to use mdtest.

Generated at Sat Feb 10 03:28:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.