[LU-6937] sanity-hsm test_106 test_403: error opening lhsmtool_posix: No such file or directory Created: 31/Jul/15  Updated: 04/Aug/17  Resolved: 11/May/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

client and server: lustre-master build # 3120 RHEL7 DNE


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d07a7a8c-3799-11e5-9d53-5254006e85c2.

The sub-test test_106 failed with the following error:

uuid 0 not found in agent list on mds1

test log

== sanity-hsm test 106: Copytool register/unregister ================================================= 13:35:57 (1438202157)
CMD: onyx-30vm5 pkill -CONT -x lhsmtool_posix
Purging archive on onyx-30vm5
CMD: onyx-30vm5 rm -rf /tmp/arc1/*
Starting copytool agt1 on onyx-30vm5
CMD: onyx-30vm5 mkdir -p /tmp/arc1
CMD: onyx-30vm5 lhsmtool_posix  --daemon --hsm-root /tmp/arc1 --bandwidth 1 /mnt/lustre2 < /dev/null > /logdir/test_logs/2015-07-29/lustre-master-el7-x86_64--full--1_7_1__3120__-70238999807760-034047/sanity-hsm.test_106.copytool_log.onyx-30vm5.log 2>&1
CMD: onyx-30vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/usr/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh pgrep -fl lhsmtool_posix 
CMD: onyx-30vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/usr/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh get_client_uuid lhsmtool_posix bash bash sh 
onyx-30vm5: error opening lhsmtool_posix: No such file or directory (2)
onyx-30vm5: cannot get name for `lhsmtool_posix': No such file or directory
onyx-30vm5: error: get_param: llite//uuid: Found no match
CMD: onyx-30vm3 /usr/sbin/lctl get_param -n mdt.lustre-MDT000.hsm.agents |		 grep 0
onyx-30vm3: error: get_param: mdt/lustre-MDT000/hsm/agents: Found no match
 sanity-hsm test_106: @@@@@@ FAIL: uuid 0 not found in agent list on mds1 

Info required for matching: sanity-hsm 106
Info required for matching: sanity-hsm 403



 Comments   
Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ]

master, build# 3264, 2.7.64 tag
Regression:EL7.1 Server/EL7.1 Client
https://testing.hpdd.intel.com/test_sets/6a2f8cf8-9f37-11e5-ba94-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 16/Dec/15 ]

Server: 2.5.5, b2_5_fe/62
Client: Master, Build# 3266, Tag 2.7.64 , RHEL 7
https://testing.hpdd.intel.com/test_sets/0d1a952a-a05f-11e5-90cc-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 18/Dec/15 ]

Another instance forEL7.1 Server/EL7.1 Client - ZFS
Master, build# 3264
https://testing.hpdd.intel.com/test_sets/3688a1f0-a135-11e5-83b8-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 18/Dec/15 ]

Another instance for EL7.1 Server/EL7.1 Client - DNE
Master , Build# 3270
https://testing.hpdd.intel.com/test_sets/57f8b5c6-a26d-11e5-bdef-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 23/Dec/15 ]

Another instance found for :
Server: 2.7.1 , b2_7_fe/34
Client: Master , Build# 3276 RHEL 7
https://testing.hpdd.intel.com/test_sets/925ad49c-a5e2-11e5-a028-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 19/Jan/16 ]

Another instance found for interop : 2.7.1 Server/EL7 Client
Server: 2.7.1, b2_7_fe/34
Client: master, build# 3303, RHEL 7
https://testing.hpdd.intel.com/test_sets/2df919c4-bb03-11e5-9137-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 20/Jan/16 ]

Another instance found for interop : 2.5.5 Server/EL7 Client
Server: 2.5.5, b2_5_fe/62
Client: master, build# 3303, RHEL 7
https://testing.hpdd.intel.com/test_sets/60ff2eba-bb0a-11e5-87b4-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 03/Feb/16 ]

Encountered another instance for tag 2.7.66 for FULL - EL7.1 Server/EL7.1 Client , master , build# 3314.
https://testing.hpdd.intel.com/test_sets/9f0a6890-ca88-11e5-84d3-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 04/Feb/16 ]

Another instance for FULL - EL7.1 Server/EL7.1 Client - DNE, master, build# 3314
https://testing.hpdd.intel.com/test_sets/911db61e-cac5-11e5-9609-5254006e85c2

Another instance on master for FULL - EL7.1 Server/EL7.1 Client - ZFS, build# 3314
https://testing.hpdd.intel.com/test_sets/c55bc81c-cb88-11e5-b49e-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 10/Feb/16 ]

Another instance found for interop tag 2.7.66 - 2.7.1 Server/EL7 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/7a4d59a6-ccde-11e5-8b0e-5254006e85c2

Another instance found for interop tag 2.7.66 - 2.5.5 Server/EL7 Client, build# 3316
https://testing.hpdd.intel.com/test_sets/1816312e-ccc8-11e5-b80c-5254006e85c2

Another instance found for Full tag 2.7.66 - EL7.1 Server/EL7.1 Client, build# 3314
https://testing.hpdd.intel.com/test_sets/9f0a6890-ca88-11e5-84d3-5254006e85c2

Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - ZFS, build# 3314
https://testing.hpdd.intel.com/test_sets/c55bc81c-cb88-11e5-b49e-5254006e85c2

Another instance found for Full tag 2.7.66 -EL7.1 Server/EL7.1 Client - DNE, build# 3314
https://testing.hpdd.intel.com/test_sets/911db61e-cac5-11e5-9609-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found for interop - 2.7.1 Server/EL7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/3b9722f8-d2f8-11e5-bf08-5254006e85c2
Another instance found for interop - 2.5.5 Server/EL7 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/ba9d84fe-d300-11e5-be5c-5254006e85c2

Comment by Niu Yawei (Inactive) [ 11/Apr/16 ]

Another instance: https://testing.hpdd.intel.com/test_sets/8f52b0de-fe26-11e5-b1c9-5254006e85c2

lhsmtools_posix isn't found, It looks like a testing environment problem.

Comment by Andreas Dilger [ 29/Apr/16 ]

This test is failing quite regularly, about 100 times in the past 4 weeks. Many of the failures are in "full" test runs, so I'm not sure if there is a different test configuration there (RHEL7 vs. RHEL6) or some other potential problem.

Comment by Andreas Dilger [ 29/Apr/16 ]

Both sanity-hsm test_106 and test_403 also appear to be failing much more frequently on the PFL patches. I'm not sure why that is, but as many failures are on PFL patches as all other patches put together, so there may be something in the PFL patches that are triggering this problem more frequently.

Comment by Sarah Liu [ 04/May/16 ]

from the previous comments, it looks like this issue only affects RHEL7

Comment by John Hammond [ 10/May/16 ]

This is likely due to a change in the output of pgrep. With the pgrep from CentOS 6.7:

# pgrep -V
pgrep (procps version 3.2.8)
# pgrep -fl lhsmtool_posix
24726 lhsmtool_posix --daemon --hsm-root /tmp/arc1/shsm --bandwidth 1 /mnt/lustre2

With a newer pgrep:

# pgrep -V
pgrep from procps-ng 3.3.9
# pgrep -fl lhsmtool_posix
24735 lhsmtool_posix
Comment by Gerrit Updater [ 10/May/16 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/20093
Subject: LU-6937 test: use ps in get_agent_uuid()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3b81057aadaf389e329320d1f8275f79425159b1

Comment by Gerrit Updater [ 10/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/20096
Subject: LU-6937 test: use ps in get_agent_uuid()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc23e9b1983124e36939dbd1fecce15d010fda61

Comment by Gerrit Updater [ 11/May/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20093/
Subject: LU-6937 test: use ps in get_agent_uuid()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: af4411d6afa93acccebd791b6e974fd84ca8eb33

Comment by Andreas Dilger [ 11/May/16 ]

It's too bad that this was failing for so long on RHEL7 but we didn't fix it until it broke master testing... At least it is now fixed.

Comment by Gerrit Updater [ 20/May/16 ]

Doug Oucharek (doug.s.oucharek@intel.com) uploaded a new patch: http://review.whamcloud.com/20341
Subject: LU-6937 test: use ps in get_agent_uuid()
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: e28e633b16bd7bee2994614872b2b65992367d0a

Comment by Gerrit Updater [ 30/May/16 ]

Amir Shehata (amir.shehata@intel.com) merged in patch http://review.whamcloud.com/20341/
Subject: LU-6937 test: use ps in get_agent_uuid()
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: 02e907afe1bc79fa32272b0afa7393d14bab35f8

Comment by Gerrit Updater [ 27/Jul/16 ]

Jinshan Xiong (jinshan.xiong@intel.com) merged in patch http://review.whamcloud.com/20295/
Subject: LU-6937 test: use ps in get_agent_uuid()
Project: fs/lustre-dev
Branch: pfl
Current Patch Set:
Commit: fe9bd6cc3c02bad282f4dd7f8a1779764d3ca4ab

Comment by nasf (Inactive) [ 05/Aug/16 ]

Hit it on b2_8:
https://testing.hpdd.intel.com/test_sets/750f521e-5aeb-11e6-aa74-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 06/Sep/16 ]

Still hit it on master:
https://testing.hpdd.intel.com/test_sets/a5a1ecf0-7249-11e6-b08e-5254006e85c2

Generated at Sat Feb 10 02:04:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.