[LU-7037] sanity-hsm test_57:gethostbyname("shadow-52vm5") failed Created: 24/Aug/15  Updated: 11/Sep/18  Resolved: 11/Sep/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: DCO Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6081 hsm: add file migrate support Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/51857356-4849-11e5-a4ad-5254006e85c2.

The sub-test test_57 failed with the following error:

== sanity-hsm test 57: Archive a file with dirty cache on another node == 17:22:03 (1440177723)
CMD: shadow-52vm5 pkill -CONT -x lhsmtool_posix
Purging archive on shadow-52vm5
CMD: shadow-52vm5 rm -rf /tmp/arc1/*
Starting copytool agt1 on shadow-52vm5
CMD: shadow-52vm5 mkdir -p /tmp/arc1
CMD: shadow-52vm5 lhsmtool_posix  --daemon --hsm-root /tmp/arc1 --bandwidth 1 /mnt/lustre2 < /dev/null > /logdir/test_logs/2015-08-21/lustre-reviews-el6_6-x86_64--review-dne-part-2--1_3_1__34138__-70321084859320-121824/sanity-hsm.test_57.copytool_log.shadow-52vm5.log 2>&1
CMD: shadow-52vm5 dd if=/dev/urandom of=/mnt/lustre/d57.sanity-hsm/test_archive_remote bs=1M  count=2 conv=fsync
shadow-52vm5: 2+0 records in
shadow-52vm5: 2+0 records out
shadow-52vm5: 2097152 bytes (2.1 MB) copied, 0.430742 s, 4.9 MB/s
CMD: shadow-52vm5 /usr/bin/lfs hsm_archive -a 2 /mnt/lustre/d57.sanity-hsm/test_archive_remote
pdsh@shadow-52vm6: gethostbyname("shadow-52vm5") failed
 sanity-hsm test_57: @@@@@@ FAIL: hsm_archive failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4750:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4781:error()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2906:test_57()
  = /usr/lib64/lustre/tests/test-framework.sh:5043:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5080:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4930:run_test()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:2920:main()
Dumping lctl log to /logdir/test_logs/2015-08-21/lustre-reviews-el6_6-x86_64--review-dne-part-2--1_3_1__34138__-70321084859320-121824/sanity-hsm.test_57.*.1440177730.log
CMD: shadow-52vm3,shadow-52vm4,shadow-52vm5,shadow-52vm6.shadow.whamcloud.com,shadow-52vm7 /usr/sbin/lctl dk > /logdir/test_logs/2015-08-21/lustre-reviews-el6_6-x86_64--review-dne-part-2--1_3_1__34138__-70321084859320-121824/sanity-hsm.test_57.debug_log.\$(hostname -s).1440177730.log;
         dmesg > /logdir/test_logs/2015-08-21/lustre-reviews-el6_6-x86_64--review-dne-part-2--1_3_1__34138__-70321084859320-121824/sanity-hsm.test_57.dmesg.\$(hostname -s).1440177730.log
CMD: shadow-52vm5 pkill -INT -x lhsmtool_posix
CMD: shadow-52vm5 pgrep -x lhsmtool_posix

Info required for matching: sanity-hsm 57

This is probably a TEI issue.



 Comments   
Comment by James Nunez (Inactive) [ 24/Aug/15 ]

Here's a few more cases of gethostbyname failures:
sanity test 228c - shadow-2 - 2015-08-21 13:50:53 - https://testing.hpdd.intel.com/test_sets/5a9e7df8-484d-11e5-a4ad-5254006e85c2
sanity 99e - shadow-8 - 2015-08-21 13:54:05 - https://testing.hpdd.intel.com/test_sets/7ae74b22-484c-11e5-a4ad-5254006e85c2
conf-sanity test 34b - shadow-16 - 2015-08-21 14:15:55 - https://testing.hpdd.intel.com/test_sets/d97f9ada-4834-11e5-813b-5254006e85c2
sanity-lfsck - shadow-17 - 2015-08-21 15:07:14 - https://testing.hpdd.intel.com/test_sets/a98e0c26-483e-11e5-8db5-5254006e85c2
sanity-sec test 8 - shadow-45 - 2015-08-21 15:11:15 - https://testing.hpdd.intel.com/test_sets/c4fe4c36-4821-11e5-9e1d-5254006e85c2
ost-pools test 6 - shadow-17 - 2015-08-21 15:44:32 - https://testing.hpdd.intel.com/test_sets/36d18552-4837-11e5-9e1d-5254006e85c2
recovery-small test 101 - shadow-16 - 2015-08-21 16:49:00 - https://testing.hpdd.intel.com/test_sets/dbf21414-4834-11e5-813b-5254006e85c2
sanity-lfsck test 12 - shadow-40 - 2015-08-27 16:41:19 - https://testing.hpdd.intel.com/test_sets/2421641c-4d06-11e5-b77f-5254006e85c2
insanity test 3 - shadow-53 - 2015-08-27 16:59:30 - https://testing.hpdd.intel.com/test_sets/91c27cc6-4ce4-11e5-ae70-5254006e85c2

Comment by James A Simmons [ 31/Aug/15 ]

This is because gethostbyname() is to weak. I found this while testing my LU-5960 work. What does work is getnameinfo() since that code actually contacts the DNS server. Who is calling that function?

Comment by nasf (Inactive) [ 03/Dec/15 ]

Similar trouble on master:
https://testing.hpdd.intel.com/test_sets/7e46142e-9955-11e5-9236-5254006e85c2

Comment by James Nunez (Inactive) [ 03/Dec/15 ]

Another failure on master
2015-12-02 18:52:45 - sanity-hsm test_27a - https://testing.hpdd.intel.com/test_sets/7e46142e-9955-11e5-9236-5254006e85c2
2015-12-02 19:11:17 - ost-pools test_23a - https://testing.hpdd.intel.com/test_sets/b989dfde-993c-11e5-802b-5254006e85c2
2015-12-02 19:49:36 - conf-sanity test_90a - https://testing.hpdd.intel.com/test_sets/86c1fe9e-9958-11e5-9236-5254006e85c2
2015-12-09 11:18:22 - conf-sanity test_22 - https://testing.hpdd.intel.com/test_sets/988fd836-9e98-11e5-87a9-5254006e85c2
2015-12-09 08:54:06 -sanityn test_34 - https://testing.hpdd.intel.com/test_sets/97374a32-9e98-11e5-87a9-5254006e85c2
2015-12-15 23:37:18 - sanityn test_46c - https://testing.hpdd.intel.com/test_sets/dd0f8e44-a3ab-11e5-806a-5254006e85c2
2015-12-15 20:07:36 - sanity-sec test_19 - https://testing.hpdd.intel.com/test_sets/9b773c62-a36e-11e5-a3ed-5254006e85c2
2016-01-14 07:44:47 - https://testing.hpdd.intel.com/test_sets/2b7b61b2-bac6-11e5-9137-5254006e85c2
2016-01-14 22:24:18 - https://testing.hpdd.intel.com/test_sets/6ae9e718-bb40-11e5-acbb-5254006e85c2
2016-01-26 16:38:56 - conf-sanity 22 - https://testing.hpdd.intel.com/test_sets/0285473a-c47d-11e5-8866-5254006e85c2

Comment by James A Simmons [ 10/Sep/18 ]

Still a problem?

Comment by Andreas Dilger [ 11/Sep/18 ]

This sub-test search shows that while this test is still failing, recent failures have been annotated with LU-6981, and have a different symptom.

Generated at Sat Feb 10 02:05:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.