[LU-15151] conf-sanity test_119: mds1: ssh: Could not resolve hostname mds1: Name or service not known Created: 22/Oct/21  Updated: 05/Nov/21  Resolved: 03/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Elena Gryaznova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9699 osp_obd_connect()) ASSERTION( osp->op... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Elena <elena.gryaznova@hpe.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4f01fe03-3433-4b2b-9ca9-ebce8607c27b

test_119 PASSED, but

mds1: ssh: Could not resolve hostname mds1: Name or service not known

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_119 -



 Comments   
Comment by Andreas Dilger [ 22/Oct/21 ]

It looks like this was introduced by patch https://review.whamcloud.com/27753 "LU-9699 osp: don't assert on OSP duplicating".

The wait_update_cond() call in test_119 should take a hostname as an argument instead of a facet name.

Comment by Gerrit Updater [ 26/Oct/21 ]

"Elena Gryaznova <elena.gryaznova@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45369
Subject: LU-15151 tests: use facet check intead of node check
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 59432387b259d06b0bee1b2e769ee2b639b9c46d

Comment by Gerrit Updater [ 03/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45369/
Subject: LU-15151 tests: use facet check instead of node check
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dc27b1c3ea1852617012753a3cce1f6c76b164af

Comment by Peter Jones [ 03/Nov/21 ]

Landed for 2.15

Comment by Alex Zhuravlev [ 04/Nov/21 ]

conf-sanity/119 times out every time with the patch:

COMMIT          TESTED  PASSED  FAILED          COMMIT DESCRIPTION
dc27b1c3ea      3       0       3       BAD     LU-15151 tests: use facet check instead of node check
fa36c6b0b9      3       3       0       GOOD    LU-15160 kernel: kernel update SLES12 SP5 [4.12.14-122.91.2]
Comment by Andreas Dilger [ 04/Nov/21 ]

Alex, what do you have in your config for $mds1_HOST (maps to $mds_HOST if unset)? Can the client $PDSH to that node (should be a no-op if it is a single client).

According to the Gerrit Janitor test logs, the test is taking 1200s to finish, and the wait_update_facet_cond() is continually timing out, but this does not cause an error return:

https://testing-archive.whamcloud.com/gerrit-janitor/19360/results.html

I'm not sure if that is how this test is supposed to work or not, but 1200s is a long time and probably the test should be added to the SLOW list.

Comment by Alex Zhuravlev [ 05/Nov/21 ]

I use default local.sh as a configuration, this is a single VM. and yes, wait_update_facet_cond() just times out while I set an explicit limit for a whole test (conf-sanity) in this case. I don't quite understand the test - it doesn't fail if wait_update_facet_cond() times out. what's the point of this waiting?

Generated at Sat Feb 10 03:15:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.